* Re: [PATCH] btrfs file write debugging patch @ 2011-03-01 16:36 Xin Zhong 2011-03-01 21:09 ` Mitch Harder 0 siblings, 1 reply; 40+ messages in thread From: Xin Zhong @ 2011-03-01 16:36 UTC (permalink / raw) To: linux-btrfs Hi, Mitch I think you can config ftrace to just trace function calls of btrfs.ko which will save a lot of trace buffer space. See below command: #echo ':mod:btrfs' > /sys/kernel/debug/tracing/set_ftrace_filterAnd please send out the full ftrace log again. Another helpful information might be the strace log of the wmldbcreate process. It will show us the io pattern of this command. Thanks a lot for your help! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-01 16:36 [PATCH] btrfs file write debugging patch Xin Zhong @ 2011-03-01 21:09 ` Mitch Harder 2011-03-02 10:58 ` Zhong, Xin 0 siblings, 1 reply; 40+ messages in thread From: Mitch Harder @ 2011-03-01 21:09 UTC (permalink / raw) To: Xin Zhong; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1720 bytes --] 2011/3/1 Xin Zhong <thierryzhong@hotmail.com>: > > Hi, Mitch > I think you can config ftrace to just trace function calls of btrfs.ko which will save a lot of trace buffer space. See below command: > #echo ':mod:btrfs' > /sys/kernel/debug/tracing/set_ftrace_filterAnd please send out the full ftrace log again. > > Another helpful information might be the strace log of the wmldbcreate process. It will show us the io pattern of this command. > Thanks a lot for your help! > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > I manually ran an strace around the build command (wmldbcreate) that is causing my problem, and I am attaching the strace for that. Please note that wmldbcreate does not seem to care when an error is returned, and continues on. So the error is occurring somewhat silently in the middle, and isn't the last item. The error is probably associated with one of the 12288 byte writes. I have re-run an ftrace following the conditions above, and have hosted that file (~1.1MB compressed) on my local server at: http://dontpanic.dyndns.org/trace-openmotif-btrfs-v15.gz Please note I am still using some debugging modifications of my own to file.c. They server the purpose of: (1) Avoiding an infinite loop by identifying when the problem is occuring, and exiting with error after 256 loops. (2) Stopping the trace after exiting to keep from flooding the ftrace buffer. (3) Provide debugging comments (all prefaced with "TPK:" in the trace). Let me know if you want me to change any of the conditions. [-- Attachment #2: wmldbcreate-strace.gz --] [-- Type: application/x-gzip, Size: 4406 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-01 21:09 ` Mitch Harder @ 2011-03-02 10:58 ` Zhong, Xin 2011-03-02 14:00 ` Xin Zhong 2011-03-04 1:51 ` Chris Mason 0 siblings, 2 replies; 40+ messages in thread From: Zhong, Xin @ 2011-03-02 10:58 UTC (permalink / raw) To: Mitch Harder, Xin Zhong; +Cc: linux-btrfs@vger.kernel.org I downloaded openmotif and run the command as Mitch mentioned and was a= ble to recreate the problem locally. And I managed to simplify the comm= and into a very simple program which can capture the problem easily. Se= e below code: #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> static char a[4096*3]; int main() { int fd =3D open("out", O_WRONLY|O_CREAT|O_TRUNC, 0666); write(fd,a+1, 4096*2); exit(0); } It seems that if we give an unaligned address to btrfs write and the bu= ffer reside on more than 2 pages. It will trigger this bug. If we give an aligned address to btrfs write, it works well no matter h= ow many pages are given.=20 I use ftrace to observe it. It seems iov_iter_fault_in_readable do not = trigger pagefault handling when the address is not aligned. I do not qu= ite understand the reason behind it. But the solution should be to proc= ess the page one by one. And that's also what generic file write routin= e does.=20 Any suggestion are welcomed. Thanks! -----Original Message----- =46rom: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vge= r.kernel.org] On Behalf Of Mitch Harder Sent: Wednesday, March 02, 2011 5:09 AM To: Xin Zhong Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch 2011/3/1 Xin Zhong <thierryzhong@hotmail.com>: > > Hi, Mitch > I think you can config ftrace to just trace function calls of btrfs.k= o which will save a lot of trace buffer space. See below command: > #echo ':mod:btrfs' > /sys/kernel/debug/tracing/set_ftrace_filterAnd p= lease send out the full ftrace log again. > > Another helpful information might be the strace log of the wmldbcreat= e process. It will show us the io pattern of this command. > Thanks a lot for your help! > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs= "=20 > in the body of a message to majordomo@vger.kernel.org More majordomo=20 > info at =A0http://vger.kernel.org/majordomo-info.html > I manually ran an strace around the build command (wmldbcreate) that is= causing my problem, and I am attaching the strace for that. Please note that wmldbcreate does not seem to care when an error is ret= urned, and continues on. So the error is occurring somewhat silently i= n the middle, and isn't the last item. The error is probably associate= d with one of the 12288 byte writes. I have re-run an ftrace following the conditions above, and have hosted= that file (~1.1MB compressed) on my local server at: http://dontpanic.dyndns.org/trace-openmotif-btrfs-v15.gz Please note I am still using some debugging modifications of my own to = file.c. They server the purpose of: (1) Avoiding an infinite loop by identifying when the problem is occuri= ng, and exiting with error after 256 loops. (2) Stopping the trace after exiting to keep from flooding the ftrace b= uffer. (3) Provide debugging comments (all prefaced with "TPK:" in the trace). Let me know if you want me to change any of the conditions. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-02 10:58 ` Zhong, Xin @ 2011-03-02 14:00 ` Xin Zhong 2011-03-04 1:51 ` Chris Mason 1 sibling, 0 replies; 40+ messages in thread From: Xin Zhong @ 2011-03-02 14:00 UTC (permalink / raw) To: xin.zhong, Mitch Harder; +Cc: linux-btrfs Sorry, I forgot to mention that you need to undo below commit in btrfs-unstable to recreate the problem: Btrfs: fix fiemap bugs with delalloc (+224/-42) Otherwise, it will run into enospc error. I am not sure if it's the same problem. ---------------------------------------- > From: xin.zhong@intel.com > To: mitch.harder@sabayonlinux.org; thierryzhong@hotmail.com > CC: linux-btrfs@vger.kernel.org > Date: Wed, 2 Mar 2011 18:58:49 +0800 > Subject: RE: [PATCH] btrfs file write debugging patch > > I downloaded openmotif and run the command as Mitch mentioned and was able to recreate the problem locally. And I managed to simplify the command into a very simple program which can capture the problem easily. See below code: > > #include > #include > #include > static char a[4096*3]; > int main() > { > int fd = open("out", O_WRONLY|O_CREAT|O_TRUNC, 0666); > write(fd,a+1, 4096*2); > exit(0); > } > > It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. > If we give an aligned address to btrfs write, it works well no matter how many pages are given. > > I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. > > Any suggestion are welcomed. Thanks! > > -----Original Message----- > From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Mitch Harder > Sent: Wednesday, March 02, 2011 5:09 AM > To: Xin Zhong > Cc: linux-btrfs@vger.kernel.org > Subject: Re: [PATCH] btrfs file write debugging patch > > 2011/3/1 Xin Zhong : > > > > Hi, Mitch > > I think you can config ftrace to just trace function calls of btrfs.ko which will save a lot of trace buffer space. See below command: > > #echo ':mod:btrfs' > /sys/kernel/debug/tracing/set_ftrace_filterAnd please send out the full ftrace log again. > > > > Another helpful information might be the strace log of the wmldbcreate process. It will show us the io pattern of this command. > > Thanks a lot for your help! > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > > in the body of a message to majordomo@vger.kernel.org More majordomo > > info at http://vger.kernel.org/majordomo-info.html > > > > I manually ran an strace around the build command (wmldbcreate) that is causing my problem, and I am attaching the strace for that. > > Please note that wmldbcreate does not seem to care when an error is returned, and continues on. So the error is occurring somewhat silently in the middle, and isn't the last item. The error is probably associated with one of the 12288 byte writes. > > I have re-run an ftrace following the conditions above, and have hosted that file (~1.1MB compressed) on my local server at: > > http://dontpanic.dyndns.org/trace-openmotif-btrfs-v15.gz > > Please note I am still using some debugging modifications of my own to file.c. > > They server the purpose of: > (1) Avoiding an infinite loop by identifying when the problem is occuring, and exiting with error after 256 loops. > (2) Stopping the trace after exiting to keep from flooding the ftrace buffer. > (3) Provide debugging comments (all prefaced with "TPK:" in the trace). > > Let me know if you want me to change any of the conditions. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-02 10:58 ` Zhong, Xin 2011-03-02 14:00 ` Xin Zhong @ 2011-03-04 1:51 ` Chris Mason 2011-03-04 2:32 ` Josef Bacik 2011-03-04 12:19 ` Chris Mason 1 sibling, 2 replies; 40+ messages in thread From: Chris Mason @ 2011-03-04 1:51 UTC (permalink / raw) To: Zhong, Xin; +Cc: Mitch Harder, Xin Zhong, linux-btrfs@vger.kernel.org Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: > I downloaded openmotif and run the command as Mitch mentioned and was able to recreate the problem locally. And I managed to simplify the command into a very simple program which can capture the problem easily. See below code: > > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > static char a[4096*3]; > int main() > { > int fd = open("out", O_WRONLY|O_CREAT|O_TRUNC, 0666); > write(fd,a+1, 4096*2); > exit(0); > } > > It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. > If we give an aligned address to btrfs write, it works well no matter how many pages are given. > > I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. > > Any suggestion are welcomed. Thanks! Great job guys. I'm using this on top of my debugging patch. It passes the unaligned test but I'll give it a real run tonight and look for other problems. (This is almost entirely untested, please don't use it quite yet) -chris diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89a6a26..6a44add 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, &i); + + /* + * if we have trouble faulting in the pages, fall + * back to one page at a time + */ + if (copied < write_bytes) + nrptrs = 1; + if (copied == 0) dirty_pages = 0; else ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-04 1:51 ` Chris Mason @ 2011-03-04 2:32 ` Josef Bacik 2011-03-04 2:42 ` Zhong, Xin 2011-03-04 12:19 ` Chris Mason 1 sibling, 1 reply; 40+ messages in thread From: Josef Bacik @ 2011-03-04 2:32 UTC (permalink / raw) To: Chris Mason Cc: Zhong, Xin, Mitch Harder, Xin Zhong, linux-btrfs@vger.kernel.org On Thu, Mar 03, 2011 at 08:51:55PM -0500, Chris Mason wrote: > Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: > > I downloaded openmotif and run the command as Mitch mentioned and was able to recreate the problem locally. And I managed to simplify the command into a very simple program which can capture the problem easily. See below code: > > > > #include <sys/types.h> > > #include <sys/stat.h> > > #include <fcntl.h> > > static char a[4096*3]; > > int main() > > { > > int fd = open("out", O_WRONLY|O_CREAT|O_TRUNC, 0666); > > write(fd,a+1, 4096*2); > > exit(0); > > } > > > > It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. > > If we give an aligned address to btrfs write, it works well no matter how many pages are given. > > > > I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. > > > > Any suggestion are welcomed. Thanks! > > Great job guys. I'm using this on top of my debugging patch. It passes > the unaligned test but I'll give it a real run tonight and look for > other problems. > > (This is almost entirely untested, please don't use it quite yet) > > -chris > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 89a6a26..6a44add 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, > > copied = btrfs_copy_from_user(pos, num_pages, > write_bytes, pages, &i); > + > + /* > + * if we have trouble faulting in the pages, fall > + * back to one page at a time > + */ > + if (copied < write_bytes) > + nrptrs = 1; > + > if (copied == 0) > dirty_pages = 0; > else Btw this situation is taken care of in my write path rewrite patch, if copied == 0 we switch to one segment at a time. Thanks, Josef ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-04 2:32 ` Josef Bacik @ 2011-03-04 2:42 ` Zhong, Xin 2011-03-04 2:41 ` Josef Bacik 0 siblings, 1 reply; 40+ messages in thread From: Zhong, Xin @ 2011-03-04 2:42 UTC (permalink / raw) To: Josef Bacik, Chris Mason Cc: Mitch Harder, Xin Zhong, linux-btrfs@vger.kernel.org Where can I find your patch? Thanks! -----Original Message----- From: Josef Bacik [mailto:josef@redhat.com] Sent: Friday, March 04, 2011 10:32 AM To: Chris Mason Cc: Zhong, Xin; Mitch Harder; Xin Zhong; linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch On Thu, Mar 03, 2011 at 08:51:55PM -0500, Chris Mason wrote: > Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: > > I downloaded openmotif and run the command as Mitch mentioned and was able to recreate the problem locally. And I managed to simplify the command into a very simple program which can capture the problem easily. See below code: > > > > #include <sys/types.h> > > #include <sys/stat.h> > > #include <fcntl.h> > > static char a[4096*3]; > > int main() > > { > > int fd = open("out", O_WRONLY|O_CREAT|O_TRUNC, 0666); > > write(fd,a+1, 4096*2); > > exit(0); > > } > > > > It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. > > If we give an aligned address to btrfs write, it works well no matter how many pages are given. > > > > I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. > > > > Any suggestion are welcomed. Thanks! > > Great job guys. I'm using this on top of my debugging patch. It passes > the unaligned test but I'll give it a real run tonight and look for > other problems. > > (This is almost entirely untested, please don't use it quite yet) > > -chris > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 89a6a26..6a44add 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, > > copied = btrfs_copy_from_user(pos, num_pages, > write_bytes, pages, &i); > + > + /* > + * if we have trouble faulting in the pages, fall > + * back to one page at a time > + */ > + if (copied < write_bytes) > + nrptrs = 1; > + > if (copied == 0) > dirty_pages = 0; > else Btw this situation is taken care of in my write path rewrite patch, if copied == 0 we switch to one segment at a time. Thanks, Josef ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-04 2:42 ` Zhong, Xin @ 2011-03-04 2:41 ` Josef Bacik 2011-03-04 8:41 ` Zhong, Xin 2011-03-05 16:56 ` Mitch Harder 0 siblings, 2 replies; 40+ messages in thread From: Josef Bacik @ 2011-03-04 2:41 UTC (permalink / raw) To: Zhong, Xin Cc: Josef Bacik, Chris Mason, Mitch Harder, Xin Zhong, linux-btrfs@vger.kernel.org On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote: > Where can I find your patch? Thanks! > It's in my btrfs-work git tree, it's based on the latest git pull from linus so you can just pull it onto a linus tree and you should be good to go. The specific patch is Btrfs: simplify our write path Thanks, Josef ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-04 2:41 ` Josef Bacik @ 2011-03-04 8:41 ` Zhong, Xin 2011-03-05 16:56 ` Mitch Harder 1 sibling, 0 replies; 40+ messages in thread From: Zhong, Xin @ 2011-03-04 8:41 UTC (permalink / raw) To: Josef Bacik Cc: Chris Mason, Mitch Harder, Xin Zhong, linux-btrfs@vger.kernel.org Looks good to me. Thanks! Another mysterious thing is that this problem can only be recreated on x86 32bit system. I can not recreate it on x86_64 system using my test case. Any one have any idea about it? -----Original Message----- From: Josef Bacik [mailto:josef@redhat.com] Sent: Friday, March 04, 2011 10:41 AM To: Zhong, Xin Cc: Josef Bacik; Chris Mason; Mitch Harder; Xin Zhong; linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote: > Where can I find your patch? Thanks! > It's in my btrfs-work git tree, it's based on the latest git pull from linus so you can just pull it onto a linus tree and you should be good to go. The specific patch is Btrfs: simplify our write path Thanks, Josef ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-04 2:41 ` Josef Bacik 2011-03-04 8:41 ` Zhong, Xin @ 2011-03-05 16:56 ` Mitch Harder 2011-03-05 17:28 ` Xin Zhong 1 sibling, 1 reply; 40+ messages in thread From: Mitch Harder @ 2011-03-05 16:56 UTC (permalink / raw) To: Josef Bacik Cc: Zhong, Xin, Chris Mason, Xin Zhong, linux-btrfs@vger.kernel.org On Thu, Mar 3, 2011 at 8:41 PM, Josef Bacik <josef@redhat.com> wrote: > On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote: >> Where can I find your patch? Thanks! >> > > It's in my btrfs-work git tree, it's based on the latest git pull fro= m linus so > you can just pull it onto a linus tree and you should be good to go. = =A0The > specific patch is > > Btrfs: simplify our write path > > Thanks, > > Josef > Josef: I've been testing the kernel from you git tree (as of commit 5555f192 Btrfs: add a comment explaining what btrfs_cont_expand does). I am still running into issues when running the portion of Openmotif that has been generating the problem with page faults when given an unaligned address. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-05 16:56 ` Mitch Harder @ 2011-03-05 17:28 ` Xin Zhong 0 siblings, 0 replies; 40+ messages in thread From: Xin Zhong @ 2011-03-05 17:28 UTC (permalink / raw) To: Mitch Harder, josef; +Cc: xin.zhong, chris.mason, linux-btrfs I think Josef's patch only address the one-by-one page processing issue. But do not address the issue that dirty_pages should be set to 0 when copied is 0. ---------------------------------------- > Date: Sat, 5 Mar 2011 10:56:52 -0600 > Subject: Re: [PATCH] btrfs file write debugging patch > From: mitch.harder@sabayonlinux.org > To: josef@redhat.com > CC: xin.zhong@intel.com; chris.mason@oracle.com; thierryzhong@hotmail.com; linux-btrfs@vger.kernel.org > > On Thu, Mar 3, 2011 at 8:41 PM, Josef Bacik wrote: > > On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote: > >> Where can I find your patch? Thanks! > >> > > > > It's in my btrfs-work git tree, it's based on the latest git pull from linus so > > you can just pull it onto a linus tree and you should be good to go. The > > specific patch is > > > > Btrfs: simplify our write path > > > > Thanks, > > > > Josef > > > > Josef: > > I've been testing the kernel from you git tree (as of commit 5555f192 > Btrfs: add a comment explaining what btrfs_cont_expand does). > > I am still running into issues when running the portion of Openmotif > that has been generating the problem with page faults when given an > unaligned address. ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-04 1:51 ` Chris Mason 2011-03-04 2:32 ` Josef Bacik @ 2011-03-04 12:19 ` Chris Mason 2011-03-04 14:25 ` Xin Zhong 1 sibling, 1 reply; 40+ messages in thread From: Chris Mason @ 2011-03-04 12:19 UTC (permalink / raw) To: Chris Mason Cc: Zhong, Xin, Mitch Harder, Xin Zhong, linux-btrfs@vger.kernel.org Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500: > Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: > > It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. > > If we give an aligned address to btrfs write, it works well no matter how many pages are given. > > > > I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. > > > > Any suggestion are welcomed. Thanks! > > Great job guys. I'm using this on top of my debugging patch. It passes > the unaligned test but I'll give it a real run tonight and look for > other problems. > > (This is almost entirely untested, please don't use it quite yet) > > -chris > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 89a6a26..6a44add 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, > > copied = btrfs_copy_from_user(pos, num_pages, > write_bytes, pages, &i); > + > + /* > + * if we have trouble faulting in the pages, fall > + * back to one page at a time > + */ > + if (copied < write_bytes) > + nrptrs = 1; > + > if (copied == 0) > dirty_pages = 0; > else Ok, this is working well for me. Anyone see any problems with it? ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-04 12:19 ` Chris Mason @ 2011-03-04 14:25 ` Xin Zhong 2011-03-04 15:33 ` Mitch Harder 0 siblings, 1 reply; 40+ messages in thread From: Xin Zhong @ 2011-03-04 14:25 UTC (permalink / raw) To: chris.mason; +Cc: xin.zhong, Mitch Harder, linux-btrfs It works well for me too. ---------------------------------------- > From: chris.mason@oracle.com > To: chris.mason@oracle.com > CC: xin.zhong@intel.com; mitch.harder@sabayonlinux.org; thierryzhong@hotmail.com; linux-btrfs@vger.kernel.org > Subject: RE: [PATCH] btrfs file write debugging patch > Date: Fri, 4 Mar 2011 07:19:39 -0500 > > Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500: > > Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: > > > It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. > > > If we give an aligned address to btrfs write, it works well no matter how many pages are given. > > > > > > I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. > > > > > > Any suggestion are welcomed. Thanks! > > > > Great job guys. I'm using this on top of my debugging patch. It passes > > the unaligned test but I'll give it a real run tonight and look for > > other problems. > > > > (This is almost entirely untested, please don't use it quite yet) > > > > > -chris > > > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > > index 89a6a26..6a44add 100644 > > --- a/fs/btrfs/file.c > > +++ b/fs/btrfs/file.c > > @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, > > > > copied = btrfs_copy_from_user(pos, num_pages, > > write_bytes, pages, &i); > > + > > + /* > > + * if we have trouble faulting in the pages, fall > > + * back to one page at a time > > + */ > > + if (copied < write_bytes) > > + nrptrs = 1; > > + > > if (copied == 0) > > dirty_pages = 0; > > else > > Ok, this is working well for me. Anyone see any problems with it? > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-04 14:25 ` Xin Zhong @ 2011-03-04 15:33 ` Mitch Harder 2011-03-04 17:21 ` Mitch Harder 0 siblings, 1 reply; 40+ messages in thread From: Mitch Harder @ 2011-03-04 15:33 UTC (permalink / raw) To: Xin Zhong; +Cc: chris.mason, xin.zhong, linux-btrfs 2011/3/4 Xin Zhong <thierryzhong@hotmail.com>: > > It works well for me too. > > ---------------------------------------- >> From: chris.mason@oracle.com >> To: chris.mason@oracle.com >> CC: xin.zhong@intel.com; mitch.harder@sabayonlinux.org; thierryzhong@hotmail.com; linux-btrfs@vger.kernel.org >> Subject: RE: [PATCH] btrfs file write debugging patch >> Date: Fri, 4 Mar 2011 07:19:39 -0500 >> >> Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500: >> > Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: >> > > It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. >> > > If we give an aligned address to btrfs write, it works well no matter how many pages are given. >> > > >> > > I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. >> > > >> > > Any suggestion are welcomed. Thanks! >> > >> > Great job guys. I'm using this on top of my debugging patch. It passes >> > the unaligned test but I'll give it a real run tonight and look for >> > other problems. >> > >> > (This is almost entirely untested, please don't use it quite yet) >> >> > >> > -chris >> > >> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >> > index 89a6a26..6a44add 100644 >> > --- a/fs/btrfs/file.c >> > +++ b/fs/btrfs/file.c >> > @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, >> > >> > copied = btrfs_copy_from_user(pos, num_pages, >> > write_bytes, pages, &i); >> > + >> > + /* >> > + * if we have trouble faulting in the pages, fall >> > + * back to one page at a time >> > + */ >> > + if (copied < write_bytes) >> > + nrptrs = 1; >> > + >> > if (copied == 0) >> > dirty_pages = 0; >> > else >> >> Ok, this is working well for me. Anyone see any problems with it? >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > I've applied this patch on top of the debugging patch at the head of the thread, and I'm having trouble building gcc now. When building gcc-4.4.5, I get errors like the following: Comparing stages 2 and 3 Bootstrap comparison failure! ./cp/call.o differs ./cp/decl.o differs ./cp/pt.o differs ./cp/class.o differs ./cp/decl2.o differs <....snip.....> ./matrix-reorg.o differs ./tree-inline.o differs ./gcc.o differs ./gcc-options.o differs make[2]: *** [compare] Error 1 make[1]: *** [stage3-bubble] Error 2 make: *** [bootstrap-lean] Error 2 emake failed I've went back and rebuilt my kernel without these two debugging patches, and gcc-4.4.5 builds without error on that kernel. I haven't yet tested building gcc-4.4.5 with just the debugging patch at the head of the thread, so I'll test that, and report back. But I was wondering if anybody else can replicate this issue. BTW, I've been doing most of my testing on an x86 system. My x86_64 systems haven't had as much trouble, but I haven't been robustingly checking my x86_64 systems for these issues. I noticed that page fault handling is different by architecture. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-04 15:33 ` Mitch Harder @ 2011-03-04 17:21 ` Mitch Harder 2011-03-05 1:00 ` Xin Zhong 0 siblings, 1 reply; 40+ messages in thread From: Mitch Harder @ 2011-03-04 17:21 UTC (permalink / raw) To: Xin Zhong; +Cc: chris.mason, xin.zhong, linux-btrfs On Fri, Mar 4, 2011 at 9:33 AM, Mitch Harder <mitch.harder@sabayonlinux.org> wrote: > 2011/3/4 Xin Zhong <thierryzhong@hotmail.com>: >> >> It works well for me too. >> >> ---------------------------------------- >>> From: chris.mason@oracle.com >>> To: chris.mason@oracle.com >>> CC: xin.zhong@intel.com; mitch.harder@sabayonlinux.org; thierryzhon= g@hotmail.com; linux-btrfs@vger.kernel.org >>> Subject: RE: [PATCH] btrfs file write debugging patch >>> Date: Fri, 4 Mar 2011 07:19:39 -0500 >>> >>> Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500: >>> > Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: >>> > > It seems that if we give an unaligned address to btrfs write an= d the buffer reside on more than 2 pages. It will trigger this bug. >>> > > If we give an aligned address to btrfs write, it works well no = matter how many pages are given. >>> > > >>> > > I use ftrace to observe it. It seems iov_iter_fault_in_readable= do not trigger pagefault handling when the address is not aligned. I d= o not quite understand the reason behind it. But the solution should be= to process the page one by one. And that's also what generic file writ= e routine does. >>> > > >>> > > Any suggestion are welcomed. Thanks! >>> > >>> > Great job guys. I'm using this on top of my debugging patch. It p= asses >>> > the unaligned test but I'll give it a real run tonight and look f= or >>> > other problems. >>> > >>> > (This is almost entirely untested, please don't use it quite yet) >>> >>> > >>> > -chris >>> > >>> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >>> > index 89a6a26..6a44add 100644 >>> > --- a/fs/btrfs/file.c >>> > +++ b/fs/btrfs/file.c >>> > @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct= kiocb *iocb, >>> > >>> > copied =3D btrfs_copy_from_user(pos, num_pages, >>> > write_bytes, pages, &i); >>> > + >>> > + /* >>> > + * if we have trouble faulting in the pages, fall >>> > + * back to one page at a time >>> > + */ >>> > + if (copied < write_bytes) >>> > + nrptrs =3D 1; >>> > + >>> > if (copied =3D=3D 0) >>> > dirty_pages =3D 0; >>> > else >>> >>> Ok, this is working well for me. Anyone see any problems with it? >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btr= fs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > I've applied this patch on top of the debugging patch at the head of > the thread, and I'm having trouble building gcc now. > > When building gcc-4.4.5, I get errors like the following: > > Comparing stages 2 and 3 > Bootstrap comparison failure! > ./cp/call.o differs > ./cp/decl.o differs > ./cp/pt.o differs > ./cp/class.o differs > ./cp/decl2.o differs > <....snip.....> > ./matrix-reorg.o differs > ./tree-inline.o differs > ./gcc.o differs > ./gcc-options.o differs > make[2]: *** [compare] Error 1 > make[1]: *** [stage3-bubble] Error 2 > make: *** [bootstrap-lean] Error 2 > emake failed > > I've went back and rebuilt my kernel without these two debugging > patches, and gcc-4.4.5 builds without error on that kernel. > > I haven't yet tested building gcc-4.4.5 with just the debugging patch > at the head of the thread, so I'll test that, and report back. > > But I was wondering if anybody else can replicate this issue. > > BTW, I've been doing most of my testing on an x86 system. =A0My x86_6= 4 > systems haven't had as much trouble, but I haven't been robustingly > checking my x86_64 systems for these issues. > > I noticed that page fault handling is different by architecture. > Some followup... I'm encountering this issue with "Bootstrap comparison failure!" in a gcc-4.4.5 build when only the patch at the head of the thread is applied (leaving the recent patch to limit pages to one-by-one on page fault out). I just hadn't run across this issue until I started playing with patches to limit the pages to one-by-one on page fault errors. So it may not be associated with the last patch. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-04 17:21 ` Mitch Harder @ 2011-03-05 1:00 ` Xin Zhong 2011-03-05 13:14 ` Mitch Harder 0 siblings, 1 reply; 40+ messages in thread From: Xin Zhong @ 2011-03-05 1:00 UTC (permalink / raw) To: Mitch Harder; +Cc: chris.mason, xin.zhong, linux-btrfs So it works well now with the two patches from Chris on your system. Am I right? ---------------------------------------- > Date: Fri, 4 Mar 2011 11:21:36 -0600 > Subject: Re: [PATCH] btrfs file write debugging patch > From: mitch.harder@sabayonlinux.org > To: thierryzhong@hotmail.com > CC: chris.mason@oracle.com; xin.zhong@intel.com; linux-btrfs@vger.kernel.org > > On Fri, Mar 4, 2011 at 9:33 AM, Mitch Harder > wrote: > > 2011/3/4 Xin Zhong : > >> > >> It works well for me too. > >> > >> ---------------------------------------- > >>> From: chris.mason@oracle.com > >>> To: chris.mason@oracle.com > >>> CC: xin.zhong@intel.com; mitch.harder@sabayonlinux.org; thierryzhong@hotmail.com; linux-btrfs@vger.kernel.org > >>> Subject: RE: [PATCH] btrfs file write debugging patch > >>> Date: Fri, 4 Mar 2011 07:19:39 -0500 > >>> > >>> Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500: > >>> > Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: > >>> > > It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. > >>> > > If we give an aligned address to btrfs write, it works well no matter how many pages are given. > >>> > > > >>> > > I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. > >>> > > > >>> > > Any suggestion are welcomed. Thanks! > >>> > > >>> > Great job guys. I'm using this on top of my debugging patch. It passes > >>> > the unaligned test but I'll give it a real run tonight and look for > >>> > other problems. > >>> > > >>> > (This is almost entirely untested, please don't use it quite yet) > >>> > >>> > > >>> > -chris > >>> > > >>> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > >>> > index 89a6a26..6a44add 100644 > >>> > --- a/fs/btrfs/file.c > >>> > +++ b/fs/btrfs/file.c > >>> > @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, > >>> > > >>> > copied = btrfs_copy_from_user(pos, num_pages, > >>> > write_bytes, pages, &i); > >>> > + > >>> > + /* > >>> > + * if we have trouble faulting in the pages, fall > >>> > + * back to one page at a time > >>> > + */ > >>> > + if (copied < write_bytes) > >>> > + nrptrs = 1; > >>> > + > >>> > if (copied == 0) > >>> > dirty_pages = 0; > >>> > else > >>> > >>> Ok, this is working well for me. Anyone see any problems with it? > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > > > > I've applied this patch on top of the debugging patch at the head of > > the thread, and I'm having trouble building gcc now. > > > > When building gcc-4.4.5, I get errors like the following: > > > > Comparing stages 2 and 3 > > Bootstrap comparison failure! > > ./cp/call.o differs > > ./cp/decl.o differs > > ./cp/pt.o differs > > ./cp/class.o differs > > ./cp/decl2.o differs > > <....snip.....> > > ./matrix-reorg.o differs > > ./tree-inline.o differs > > ./gcc.o differs > > ./gcc-options.o differs > > make[2]: *** [compare] Error 1 > > make[1]: *** [stage3-bubble] Error 2 > > make: *** [bootstrap-lean] Error 2 > > emake failed > > > > I've went back and rebuilt my kernel without these two debugging > > patches, and gcc-4.4.5 builds without error on that kernel. > > > > I haven't yet tested building gcc-4.4.5 with just the debugging patch > > at the head of the thread, so I'll test that, and report back. > > > > But I was wondering if anybody else can replicate this issue. > > > > BTW, I've been doing most of my testing on an x86 system. My x86_64 > > systems haven't had as much trouble, but I haven't been robustingly > > checking my x86_64 systems for these issues. > > > > I noticed that page fault handling is different by architecture. > > > > Some followup... > > I'm encountering this issue with "Bootstrap comparison failure!" in a > gcc-4.4.5 build when only the patch at the head of the thread is > applied (leaving the recent patch to limit pages to one-by-one on page > fault out). > > I just hadn't run across this issue until I started playing with > patches to limit the pages to one-by-one on page fault errors. > > So it may not be associated with the last patch. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-05 1:00 ` Xin Zhong @ 2011-03-05 13:14 ` Mitch Harder 2011-03-05 16:50 ` Mitch Harder 0 siblings, 1 reply; 40+ messages in thread From: Mitch Harder @ 2011-03-05 13:14 UTC (permalink / raw) To: Xin Zhong; +Cc: chris.mason, xin.zhong, linux-btrfs 2011/3/4 Xin Zhong <thierryzhong@hotmail.com>: > > So it works well now with the two patches from Chris on your system. Am I right? > No. I am getting errors building gcc-4.4.5 with the two patches from Chris. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-05 13:14 ` Mitch Harder @ 2011-03-05 16:50 ` Mitch Harder 2011-03-06 18:00 ` Chris Mason 0 siblings, 1 reply; 40+ messages in thread From: Mitch Harder @ 2011-03-05 16:50 UTC (permalink / raw) To: Xin Zhong; +Cc: chris.mason, xin.zhong, linux-btrfs I've constructed a test patch that is currently addressing all the issues on my system. The portion of Openmotif that was having issues with page faults works correctly with this patch, and gcc-4.4.5 builds without issue. I extracted only the portion of the first patch that corrects the handling of dirty_pages when copied==0, and incorporated the second patch that falls back to one-page-at-a-time if there are troubles with page faults. --- fs/btrfs/file.c 2011-03-05 07:34:43.025131607 -0600 +++ /usr/src/linux/fs/btrfs/file.c 2011-03-05 07:41:45.001260294 -0600 @@ -1023,8 +1023,20 @@ copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, &i); - dirty_pages = (copied + offset + PAGE_CACHE_SIZE - 1) >> - PAGE_CACHE_SHIFT; + + /* + * if we have trouble faulting in the pages, fall + * back to one page at a time + */ + if (copied < write_bytes) + nrptrs = 1; + + if (copied == 0) + dirty_pages = 0; + else + dirty_pages = (copied + offset + + PAGE_CACHE_SIZE - 1) >> + PAGE_CACHE_SHIFT; if (num_pages > dirty_pages) { if (copied > 0) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-05 16:50 ` Mitch Harder @ 2011-03-06 18:00 ` Chris Mason 2011-03-07 0:58 ` Chris Mason 0 siblings, 1 reply; 40+ messages in thread From: Chris Mason @ 2011-03-06 18:00 UTC (permalink / raw) To: Mitch Harder; +Cc: Xin Zhong, xin.zhong, linux-btrfs Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: > I've constructed a test patch that is currently addressing all the > issues on my system. > > The portion of Openmotif that was having issues with page faults works > correctly with this patch, and gcc-4.4.5 builds without issue. > > I extracted only the portion of the first patch that corrects the > handling of dirty_pages when copied==0, and incorporated the second > patch that falls back to one-page-at-a-time if there are troubles with > page faults. Just to make sure I understand, could you please post the full combined path that was giving you trouble with gcc? We do need to make sure the pages are properly up to date if we fall back to partial writes. -chris ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-06 18:00 ` Chris Mason @ 2011-03-07 0:58 ` Chris Mason 2011-03-07 6:07 ` Mitch Harder 0 siblings, 1 reply; 40+ messages in thread From: Chris Mason @ 2011-03-07 0:58 UTC (permalink / raw) To: Chris Mason; +Cc: Mitch Harder, Xin Zhong, xin.zhong, linux-btrfs Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: > Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: > > I've constructed a test patch that is currently addressing all the > > issues on my system. > > > > The portion of Openmotif that was having issues with page faults works > > correctly with this patch, and gcc-4.4.5 builds without issue. > > > > I extracted only the portion of the first patch that corrects the > > handling of dirty_pages when copied==0, and incorporated the second > > patch that falls back to one-page-at-a-time if there are troubles with > > page faults. > > Just to make sure I understand, could you please post the full combined > path that was giving you trouble with gcc? We do need to make sure the > pages are properly up to date if we fall back to partial writes. Ok, I was able to reproduce this easily with fsx. The problem is that I wasn't making sure the last partial page in the write was up to date when it was also the first page in the write. Here is the updated patch, it has all the fixes we've found so far: diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 7084140..5986ac7 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -763,6 +763,27 @@ out: } /* + * on error we return an unlocked page and the error value + * on success we return a locked page and 0 + */ +static int prepare_uptodate_page(struct page *page, u64 pos) +{ + int ret = 0; + + if ((pos & (PAGE_CACHE_SIZE - 1)) && !PageUptodate(page)) { + ret = btrfs_readpage(NULL, page); + if (ret) + return ret; + lock_page(page); + if (!PageUptodate(page)) { + unlock_page(page); + return -EIO; + } + } + return 0; +} + +/* * this gets pages into the page cache and locks them down, it also properly * waits for data=ordered extents to finish before allowing the pages to be * modified. @@ -777,6 +798,7 @@ static noinline int prepare_pages(struct btrfs_root *root, struct file *file, unsigned long index = pos >> PAGE_CACHE_SHIFT; struct inode *inode = fdentry(file)->d_inode; int err = 0; + int faili = 0; u64 start_pos; u64 last_pos; @@ -794,15 +816,24 @@ again: for (i = 0; i < num_pages; i++) { pages[i] = grab_cache_page(inode->i_mapping, index + i); if (!pages[i]) { - int c; - for (c = i - 1; c >= 0; c--) { - unlock_page(pages[c]); - page_cache_release(pages[c]); - } - return -ENOMEM; + faili = i - 1; + err = -ENOMEM; + goto fail; + } + + if (i == 0) + err = prepare_uptodate_page(pages[i], pos); + if (i == num_pages - 1) + err = prepare_uptodate_page(pages[i], + pos + write_bytes); + if (err) { + page_cache_release(pages[i]); + faili = i - 1; + goto fail; } wait_on_page_writeback(pages[i]); } + err = 0; if (start_pos < inode->i_size) { struct btrfs_ordered_extent *ordered; lock_extent_bits(&BTRFS_I(inode)->io_tree, @@ -842,6 +873,14 @@ again: WARN_ON(!PageLocked(pages[i])); } return 0; +fail: + while (faili >= 0) { + unlock_page(pages[faili]); + page_cache_release(pages[faili]); + faili--; + } + return err; + } static ssize_t btrfs_file_aio_write(struct kiocb *iocb, @@ -851,7 +890,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, struct file *file = iocb->ki_filp; struct inode *inode = fdentry(file)->d_inode; struct btrfs_root *root = BTRFS_I(inode)->root; - struct page *pinned[2]; struct page **pages = NULL; struct iov_iter i; loff_t *ppos = &iocb->ki_pos; @@ -872,9 +910,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, will_write = ((file->f_flags & O_DSYNC) || IS_SYNC(inode) || (file->f_flags & O_DIRECT)); - pinned[0] = NULL; - pinned[1] = NULL; - start_pos = pos; vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE); @@ -962,32 +997,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, first_index = pos >> PAGE_CACHE_SHIFT; last_index = (pos + iov_iter_count(&i)) >> PAGE_CACHE_SHIFT; - /* - * there are lots of better ways to do this, but this code - * makes sure the first and last page in the file range are - * up to date and ready for cow - */ - if ((pos & (PAGE_CACHE_SIZE - 1))) { - pinned[0] = grab_cache_page(inode->i_mapping, first_index); - if (!PageUptodate(pinned[0])) { - ret = btrfs_readpage(NULL, pinned[0]); - BUG_ON(ret); - wait_on_page_locked(pinned[0]); - } else { - unlock_page(pinned[0]); - } - } - if ((pos + iov_iter_count(&i)) & (PAGE_CACHE_SIZE - 1)) { - pinned[1] = grab_cache_page(inode->i_mapping, last_index); - if (!PageUptodate(pinned[1])) { - ret = btrfs_readpage(NULL, pinned[1]); - BUG_ON(ret); - wait_on_page_locked(pinned[1]); - } else { - unlock_page(pinned[1]); - } - } - while (iov_iter_count(&i) > 0) { size_t offset = pos & (PAGE_CACHE_SIZE - 1); size_t write_bytes = min(iov_iter_count(&i), @@ -1024,8 +1033,20 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, &i); - dirty_pages = (copied + offset + PAGE_CACHE_SIZE - 1) >> - PAGE_CACHE_SHIFT; + + /* + * if we have trouble faulting in the pages, fall + * back to one page at a time + */ + if (copied < write_bytes) + nrptrs = 1; + + if (copied == 0) + dirty_pages = 0; + else + dirty_pages = (copied + offset + + PAGE_CACHE_SIZE - 1) >> + PAGE_CACHE_SHIFT; if (num_pages > dirty_pages) { if (copied > 0) @@ -1069,10 +1090,6 @@ out: err = ret; kfree(pages); - if (pinned[0]) - page_cache_release(pinned[0]); - if (pinned[1]) - page_cache_release(pinned[1]); *ppos = pos; /* ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-07 0:58 ` Chris Mason @ 2011-03-07 6:07 ` Mitch Harder 2011-03-07 6:37 ` Zhong, Xin 2011-03-07 19:56 ` Maria Wikström 0 siblings, 2 replies; 40+ messages in thread From: Mitch Harder @ 2011-03-07 6:07 UTC (permalink / raw) To: Chris Mason; +Cc: Xin Zhong, xin.zhong, linux-btrfs On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason <chris.mason@oracle.com> wr= ote: > Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: >> Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: >> > I've constructed a test patch that is currently addressing all the >> > issues on my system. >> > >> > The portion of Openmotif that was having issues with page faults w= orks >> > correctly with this patch, and gcc-4.4.5 builds without issue. >> > >> > I extracted only the portion of the first patch that corrects the >> > handling of dirty_pages when copied=3D=3D0, and incorporated the s= econd >> > patch that falls back to one-page-at-a-time if there are troubles = with >> > page faults. >> >> Just to make sure I understand, could you please post the full combi= ned >> path that was giving you trouble with gcc? =A0We do need to make sur= e the >> pages are properly up to date if we fall back to partial writes. > > Ok, I was able to reproduce this easily with fsx. =A0The problem is t= hat I > wasn't making sure the last partial page in the write was up to date > when it was also the first page in the write. > > Here is the updated patch, it has all the fixes we've found so far: > This latest patch that Chris has sent out fixes the issues I've been encountering. I can build gcc-4.4.5 without problems. Also, the portion of Openmotif that was having issues with page faults is working correctly. Let me know if you still would like to see the path names for the portions of the gcc-4.4.5 build that were giving me issues. I didn't save that information, but I can regenerate it. But it sounds like it's irrelevant now. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-07 6:07 ` Mitch Harder @ 2011-03-07 6:37 ` Zhong, Xin 2011-03-07 19:56 ` Maria Wikström 1 sibling, 0 replies; 40+ messages in thread From: Zhong, Xin @ 2011-03-07 6:37 UTC (permalink / raw) To: Mitch Harder, Chris Mason; +Cc: Xin Zhong, linux-btrfs That's great! :-) -----Original Message----- =46rom: Mitch Harder [mailto:mitch.harder@sabayonlinux.org]=20 Sent: Monday, March 07, 2011 2:08 PM To: Chris Mason Cc: Xin Zhong; Zhong, Xin; linux-btrfs Subject: Re: [PATCH] btrfs file write debugging patch On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason <chris.mason@oracle.com> wr= ote: > Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: >> Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: >> > I've constructed a test patch that is currently addressing all the >> > issues on my system. >> > >> > The portion of Openmotif that was having issues with page faults w= orks >> > correctly with this patch, and gcc-4.4.5 builds without issue. >> > >> > I extracted only the portion of the first patch that corrects the >> > handling of dirty_pages when copied=3D=3D0, and incorporated the s= econd >> > patch that falls back to one-page-at-a-time if there are troubles = with >> > page faults. >> >> Just to make sure I understand, could you please post the full combi= ned >> path that was giving you trouble with gcc? =A0We do need to make sur= e the >> pages are properly up to date if we fall back to partial writes. > > Ok, I was able to reproduce this easily with fsx. =A0The problem is t= hat I > wasn't making sure the last partial page in the write was up to date > when it was also the first page in the write. > > Here is the updated patch, it has all the fixes we've found so far: > This latest patch that Chris has sent out fixes the issues I've been encountering. I can build gcc-4.4.5 without problems. Also, the portion of Openmotif that was having issues with page faults is working correctly. Let me know if you still would like to see the path names for the portions of the gcc-4.4.5 build that were giving me issues. I didn't save that information, but I can regenerate it. But it sounds like it's irrelevant now. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-07 6:07 ` Mitch Harder 2011-03-07 6:37 ` Zhong, Xin @ 2011-03-07 19:56 ` Maria Wikström 2011-03-07 22:12 ` Johannes Hirte 1 sibling, 1 reply; 40+ messages in thread From: Maria Wikström @ 2011-03-07 19:56 UTC (permalink / raw) To: Mitch Harder; +Cc: Chris Mason, Xin Zhong, xin.zhong, linux-btrfs m=C3=A5n 2011-03-07 klockan 00:07 -0600 skrev Mitch Harder: > On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason <chris.mason@oracle.com> = wrote: > > Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: > >> Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: > >> > I've constructed a test patch that is currently addressing all t= he > >> > issues on my system. > >> > > >> > The portion of Openmotif that was having issues with page faults= works > >> > correctly with this patch, and gcc-4.4.5 builds without issue. > >> > > >> > I extracted only the portion of the first patch that corrects th= e > >> > handling of dirty_pages when copied=3D=3D0, and incorporated the= second > >> > patch that falls back to one-page-at-a-time if there are trouble= s with > >> > page faults. > >> > >> Just to make sure I understand, could you please post the full com= bined > >> path that was giving you trouble with gcc? We do need to make sur= e the > >> pages are properly up to date if we fall back to partial writes. > > > > Ok, I was able to reproduce this easily with fsx. The problem is t= hat I > > wasn't making sure the last partial page in the write was up to dat= e > > when it was also the first page in the write. > > > > Here is the updated patch, it has all the fixes we've found so far: > > >=20 > This latest patch that Chris has sent out fixes the issues I've been > encountering. >=20 > I can build gcc-4.4.5 without problems. >=20 > Also, the portion of Openmotif that was having issues with page fault= s > is working correctly. >=20 > Let me know if you still would like to see the path names for the > portions of the gcc-4.4.5 build that were giving me issues. I didn't > save that information, but I can regenerate it. But it sounds like > it's irrelevant now. With the patch I can compile libgcrypt without any problem, so it solve= s my problems to. // Maria -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-07 19:56 ` Maria Wikström @ 2011-03-07 22:12 ` Johannes Hirte 2011-03-08 2:51 ` Zhong, Xin 0 siblings, 1 reply; 40+ messages in thread From: Johannes Hirte @ 2011-03-07 22:12 UTC (permalink / raw) To: Maria Wikström Cc: Mitch Harder, Chris Mason, Xin Zhong, xin.zhong, linux-btrfs On Monday 07 March 2011 20:56:50 Maria Wikstr=C3=B6m wrote: > m=C3=A5n 2011-03-07 klockan 00:07 -0600 skrev Mitch Harder: > > On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason <chris.mason@oracle.com= >=20 wrote: > > > Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: > > >> Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -050= 0: > > >> > I've constructed a test patch that is currently addressing all= the > > >> > issues on my system. > > >> >=20 > > >> > The portion of Openmotif that was having issues with page faul= ts > > >> > works correctly with this patch, and gcc-4.4.5 builds without > > >> > issue. > > >> >=20 > > >> > I extracted only the portion of the first patch that corrects = the > > >> > handling of dirty_pages when copied=3D=3D0, and incorporated t= he second > > >> > patch that falls back to one-page-at-a-time if there are troub= les > > >> > with page faults. > > >>=20 > > >> Just to make sure I understand, could you please post the full > > >> combined path that was giving you trouble with gcc? We do need = to > > >> make sure the pages are properly up to date if we fall back to > > >> partial writes. > > >=20 > > > Ok, I was able to reproduce this easily with fsx. The problem is= that > > > I wasn't making sure the last partial page in the write was up to= date > > > when it was also the first page in the write. > >=20 > > > Here is the updated patch, it has all the fixes we've found so fa= r: > > This latest patch that Chris has sent out fixes the issues I've bee= n > > encountering. > >=20 > > I can build gcc-4.4.5 without problems. > >=20 > > Also, the portion of Openmotif that was having issues with page fau= lts > > is working correctly. > >=20 > > Let me know if you still would like to see the path names for the > > portions of the gcc-4.4.5 build that were giving me issues. I didn= 't > > save that information, but I can regenerate it. But it sounds like > > it's irrelevant now. >=20 > With the patch I can compile libgcrypt without any problem, so it sol= ves > my problems to. Can confirm this. And the bug seems to be hardware-related. On my Penti= um4=20 system it was 100% reproducible, on my Atom-based system I couldn't tri= gger=20 it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-07 22:12 ` Johannes Hirte @ 2011-03-08 2:51 ` Zhong, Xin 0 siblings, 0 replies; 40+ messages in thread From: Zhong, Xin @ 2011-03-08 2:51 UTC (permalink / raw) To: Johannes Hirte, Maria Wikström Cc: Mitch Harder, Chris Mason, Xin Zhong, linux-btrfs RnJvbSBteSBleHBlcmllbmNlLCBvbmx5IHg4NiAzMmJpdCBrZXJuZWwgaGFzIHRoaXMgcHJvYmxl bSBhbmQgNjRiaXQga2VybmVsIGRvIG5vdCBoYXZlIGl0LiANCg0KSG93ZXZlciwgYXRvbSBiYXNl ZCBzeXN0ZW0gZG8gbm90IGhhdmUgaXQgYWx0aG91Z2ggaXQncyBpbnN0YWxsZWQgd2l0aCBhIDMy Yml0IGtlcm5lbC4gDQoNCi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQpGcm9tOiBKb2hhbm5l cyBIaXJ0ZSBbbWFpbHRvOmpvaGFubmVzLmhpcnRlQGZlbS50dS1pbG1lbmF1LmRlXSANClNlbnQ6 IFR1ZXNkYXksIE1hcmNoIDA4LCAyMDExIDY6MTIgQU0NClRvOiBNYXJpYSBXaWtzdHLDtm0NCkNj OiBNaXRjaCBIYXJkZXI7IENocmlzIE1hc29uOyBYaW4gWmhvbmc7IFpob25nLCBYaW47IGxpbnV4 LWJ0cmZzDQpTdWJqZWN0OiBSZTogW1BBVENIXSBidHJmcyBmaWxlIHdyaXRlIGRlYnVnZ2luZyBw YXRjaA0KDQpPbiBNb25kYXkgMDcgTWFyY2ggMjAxMSAyMDo1Njo1MCBNYXJpYSBXaWtzdHLDtm0g d3JvdGU6DQo+IG3DpW4gMjAxMS0wMy0wNyBrbG9ja2FuIDAwOjA3IC0wNjAwIHNrcmV2IE1pdGNo IEhhcmRlcjoNCj4gPiBPbiBTdW4sIE1hciA2LCAyMDExIGF0IDY6NTggUE0sIENocmlzIE1hc29u IDxjaHJpcy5tYXNvbkBvcmFjbGUuY29tPiANCndyb3RlOg0KPiA+ID4gRXhjZXJwdHMgZnJvbSBD aHJpcyBNYXNvbidzIG1lc3NhZ2Ugb2YgMjAxMS0wMy0wNiAxMzowMDoyNyAtMDUwMDoNCj4gPiA+ PiBFeGNlcnB0cyBmcm9tIE1pdGNoIEhhcmRlcidzIG1lc3NhZ2Ugb2YgMjAxMS0wMy0wNSAxMTo1 MDoxNCAtMDUwMDoNCj4gPiA+PiA+IEkndmUgY29uc3RydWN0ZWQgYSB0ZXN0IHBhdGNoIHRoYXQg aXMgY3VycmVudGx5IGFkZHJlc3NpbmcgYWxsIHRoZQ0KPiA+ID4+ID4gaXNzdWVzIG9uIG15IHN5 c3RlbS4NCj4gPiA+PiA+IA0KPiA+ID4+ID4gVGhlIHBvcnRpb24gb2YgT3Blbm1vdGlmIHRoYXQg d2FzIGhhdmluZyBpc3N1ZXMgd2l0aCBwYWdlIGZhdWx0cw0KPiA+ID4+ID4gd29ya3MgY29ycmVj dGx5IHdpdGggdGhpcyBwYXRjaCwgYW5kIGdjYy00LjQuNSBidWlsZHMgd2l0aG91dA0KPiA+ID4+ ID4gaXNzdWUuDQo+ID4gPj4gPiANCj4gPiA+PiA+IEkgZXh0cmFjdGVkIG9ubHkgdGhlIHBvcnRp b24gb2YgdGhlIGZpcnN0IHBhdGNoIHRoYXQgY29ycmVjdHMgdGhlDQo+ID4gPj4gPiBoYW5kbGlu ZyBvZiBkaXJ0eV9wYWdlcyB3aGVuIGNvcGllZD09MCwgYW5kIGluY29ycG9yYXRlZCB0aGUgc2Vj b25kDQo+ID4gPj4gPiBwYXRjaCB0aGF0IGZhbGxzIGJhY2sgdG8gb25lLXBhZ2UtYXQtYS10aW1l IGlmIHRoZXJlIGFyZSB0cm91Ymxlcw0KPiA+ID4+ID4gd2l0aCBwYWdlIGZhdWx0cy4NCj4gPiA+ PiANCj4gPiA+PiBKdXN0IHRvIG1ha2Ugc3VyZSBJIHVuZGVyc3RhbmQsIGNvdWxkIHlvdSBwbGVh c2UgcG9zdCB0aGUgZnVsbA0KPiA+ID4+IGNvbWJpbmVkIHBhdGggdGhhdCB3YXMgZ2l2aW5nIHlv dSB0cm91YmxlIHdpdGggZ2NjPyAgV2UgZG8gbmVlZCB0bw0KPiA+ID4+IG1ha2Ugc3VyZSB0aGUg cGFnZXMgYXJlIHByb3Blcmx5IHVwIHRvIGRhdGUgaWYgd2UgZmFsbCBiYWNrIHRvDQo+ID4gPj4g cGFydGlhbCB3cml0ZXMuDQo+ID4gPiANCj4gPiA+IE9rLCBJIHdhcyBhYmxlIHRvIHJlcHJvZHVj ZSB0aGlzIGVhc2lseSB3aXRoIGZzeC4gIFRoZSBwcm9ibGVtIGlzIHRoYXQNCj4gPiA+IEkgd2Fz bid0IG1ha2luZyBzdXJlIHRoZSBsYXN0IHBhcnRpYWwgcGFnZSBpbiB0aGUgd3JpdGUgd2FzIHVw IHRvIGRhdGUNCj4gPiA+IHdoZW4gaXQgd2FzIGFsc28gdGhlIGZpcnN0IHBhZ2UgaW4gdGhlIHdy aXRlLg0KPiA+IA0KPiA+ID4gSGVyZSBpcyB0aGUgdXBkYXRlZCBwYXRjaCwgaXQgaGFzIGFsbCB0 aGUgZml4ZXMgd2UndmUgZm91bmQgc28gZmFyOg0KPiA+IFRoaXMgbGF0ZXN0IHBhdGNoIHRoYXQg Q2hyaXMgaGFzIHNlbnQgb3V0IGZpeGVzIHRoZSBpc3N1ZXMgSSd2ZSBiZWVuDQo+ID4gZW5jb3Vu dGVyaW5nLg0KPiA+IA0KPiA+IEkgY2FuIGJ1aWxkIGdjYy00LjQuNSB3aXRob3V0IHByb2JsZW1z Lg0KPiA+IA0KPiA+IEFsc28sIHRoZSBwb3J0aW9uIG9mIE9wZW5tb3RpZiB0aGF0IHdhcyBoYXZp bmcgaXNzdWVzIHdpdGggcGFnZSBmYXVsdHMNCj4gPiBpcyB3b3JraW5nIGNvcnJlY3RseS4NCj4g PiANCj4gPiBMZXQgbWUga25vdyBpZiB5b3Ugc3RpbGwgd291bGQgbGlrZSB0byBzZWUgdGhlIHBh dGggbmFtZXMgZm9yIHRoZQ0KPiA+IHBvcnRpb25zIG9mIHRoZSBnY2MtNC40LjUgYnVpbGQgdGhh dCB3ZXJlIGdpdmluZyBtZSBpc3N1ZXMuICBJIGRpZG4ndA0KPiA+IHNhdmUgdGhhdCBpbmZvcm1h dGlvbiwgYnV0IEkgY2FuIHJlZ2VuZXJhdGUgaXQuICBCdXQgaXQgc291bmRzIGxpa2UNCj4gPiBp dCdzIGlycmVsZXZhbnQgbm93Lg0KPiANCj4gV2l0aCB0aGUgcGF0Y2ggSSBjYW4gY29tcGlsZSBs aWJnY3J5cHQgd2l0aG91dCBhbnkgcHJvYmxlbSwgc28gaXQgc29sdmVzDQo+IG15IHByb2JsZW1z IHRvLg0KDQpDYW4gY29uZmlybSB0aGlzLiBBbmQgdGhlIGJ1ZyBzZWVtcyB0byBiZSBoYXJkd2Fy ZS1yZWxhdGVkLiBPbiBteSBQZW50aXVtNCANCnN5c3RlbSBpdCB3YXMgMTAwJSByZXByb2R1Y2li bGUsIG9uIG15IEF0b20tYmFzZWQgc3lzdGVtIEkgY291bGRuJ3QgdHJpZ2dlciANCml0Lg0K ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2]Btrfs: pwrite blocked when writing from the mmaped buffer of the same page @ 2011-02-25 18:43 Mitch Harder 2011-02-28 1:46 ` [PATCH] btrfs file write debugging patch Chris Mason 0 siblings, 1 reply; 40+ messages in thread From: Mitch Harder @ 2011-02-25 18:43 UTC (permalink / raw) To: Chris Mason Cc: Maria Wikström, Zhong, Xin, Johannes Hirte, linux-btrfs@vger.kernel.org On Fri, Feb 25, 2011 at 11:11 AM, Mitch Harder <mitch.harder@sabayonlinux.org> wrote: > On Thu, Feb 24, 2011 at 5:14 PM, Mitch Harder > <mitch.harder@sabayonlinux.org> wrote: >> On Thu, Feb 24, 2011 at 10:32 AM, Mitch Harder >> <mitch.harder@sabayonlinux.org> wrote: >>> On Thu, Feb 24, 2011 at 10:19 AM, Chris Mason <chris.mason@oracle.c= om> wrote: >>>> Excerpts from Mitch Harder's message of 2011-02-24 11:03:07 -0500: >>>>> On Thu, Feb 24, 2011 at 10:00 AM, Chris Mason <chris.mason@oracle= =2Ecom> wrote: >>>>> > Excerpts from Mitch Harder's message of 2011-02-24 10:55:15 -05= 00: >>>>> >> 2011/2/24 Maria Wikstr=F6m <maria@ponstudios.se>: >>>>> >> > m=E5n 2011-02-21 klockan 09:51 +0800 skrev Zhong, Xin: >>>>> >> >> The backtrace in your attachment looks like a known bug of = 2.6.37 which have already been fixed in 2.6.38. I have no idea why late= st btrfs still hang in your environment if there's no debug info... >>>>> >> >> >>>>> >> > >>>>> >> > Haha, yes that's very hard :) >>>>> >> > >>>>> >> > 2.6.38-rc6 and btrfs-unstable behaves the same way. I can cl= ose the >>>>> >> > process with ctrl+c and it disappear a few seconds later. Th= ere is no >>>>> >> > CPU usage. Reading works because I can start htop and watch = "svn info" >>>>> >> > disappear, but everything writing to btrfs slows down to a c= rawl. It >>>>> >> > takes about 1 minute to log in. So I had to put the logs on = an other >>>>> >> > partition using ext3 to get the output from sysrq+t. >>>>> >> > >>>>> >> >>>>> >> I believe I've been experiencing this issue also. =A0However, = my problem >>>>> >> usually results in a "No space left on device" error rather th= an a >>>>> >> lock-up or crash. =A0But I've bisected my issue to this patch,= and my >>>>> >> "btrfs fi show" and "btrfs fi df" looks similar to others who'= ve >>>>> >> posted to this tread with all my space being allocated, but no= t used. >>>>> >> >>>>> > >>>>> > Sorry, which patch did you bisect the problem down to? >>>>> > >>>>> >>>>> The patch at the head of this thread: >>>>> >>>>> Btrfs: pwrite blocked when writing from the mmaped buffer of the = same page >>>> >>>> Hmmm, that patch shouldn't be changing our performance under delal= loc >>>> pressure, and it really shouldn't impact early enospc. >>>> >>> >>> I've bisected this issue around where this patch went into git, and >>> I've also constructed a testing patch that reverts this patch, and >>> placed it on top of the current Btrfs git sources (I understand thi= s >>> patch addresses a real issue, this was just for testing). >>> >>> It could be that this patch just "uncovers" another problem, but al= l >>> my tests seem to point to this patch triggering this issue. >>> >> >> I don't belief the previous ftrace I supplied had a large enough sco= pe >> to capture the issue. >> >> I've expanded my ftrace buffer, and filtered out everything but btrf= s* >> function calls ("# echo btrfs* > >> /sys/kernel/debug/tracing/set_ftrace_filter"). >> >> In this trace, I see btrfs spending a great deal of time in a while >> loop (while (iov_iter_count(&i) > 0) {)) in the btrfs_file_aio_write= () >> function in file.c without exiting the function. >> >> I'm going to try to inject some debugging trace_printk() statements = to >> find if that portion of code is proceeding normally with my test cas= e. >> >> I've put my expanded trace up on my local server, but my upload >> bandwidth is pretty sad, and it may take a few minutes to transfer >> even though it's only a 6MB file. >> >> http://dontpanic.dyndns.org/trace-openmotif-btrfs-v3.gz >> > > Apologies for only hitting "Reply" instead of "Reply-All" on my last = message. > > I've inserted additional trace_printk() to the btrfs_file_aio_write() > and btrfs_copy_from_user() function in file.c in order to characteriz= e > the problem I've been encountering. > > I can see btrfs getting stuck in a loop in the "while > (iov_iter_count(&i) > 0) {}" portion of the btrfs_file_aio_write() > function. > > The loop is more-or-less following this process (from within the > "while (iov_iter_count(&i) > 0) {}" loop): > > (1) Reserve some space with btrfs_delalloc_reserve_space() > (2) Prepare the reserved space with prepare_pages() > (3) Call btrfs_copy_from_user() to copy to the prepared space. > -------------> From btrfs_copy_from_user() > (4) ........Try to copy with copied =3D iov_iter_copy_from_user_atomi= c() > (5) ........The above operation results with copied =3D=3D 0. Break a= nd > return with a return value of 0 bytes copied. > (6) There is no special handling for copied =3D=3D 0 in the "while > (iov_iter_count(&i) > 0) {}" loop, so it loops back around, reserves > some more space, and tries again. > > If I look back at how the code was set up before the patch at the hea= d > of this thread was applied (Btrfs: pwrite blocked when writing from > the mmaped buffer of the same page), the btrfs_copy_from_user() > function had some handling for "copied =3D=3D 0" that would change th= e > scope of the amount to write, and loop back to try the write again. > > I attempted to construct a patch that just reverted the handling for > "copied =3D=3D 0" in btrfs_copy_from_user(), however, that just resul= ted > in my computer locking up when it reached the point where it was > previously beginning to allocate disk space. > > So, I apologize for not having a patch to address the issue I'm > seeing, but I hope I've added some insight. > Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when "copied =3D=3D 0". So I'm seeing at least two problems: (1) It keeps looping when "copied =3D=3D 0". (2) One dirty page is not being released on every loop even though "copied =3D=3D 0" (at least this problem keeps it from being an infinit= e loop by eventually exhausting reserveable space on the disk). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] btrfs file write debugging patch 2011-02-25 18:43 [PATCH v2]Btrfs: pwrite blocked when writing from the mmaped buffer of the same page Mitch Harder @ 2011-02-28 1:46 ` Chris Mason 2011-02-28 8:56 ` Zhong, Xin 2011-02-28 10:13 ` Johannes Hirte 0 siblings, 2 replies; 40+ messages in thread From: Chris Mason @ 2011-02-28 1:46 UTC (permalink / raw) To: Mitch Harder Cc: Maria Wikström, Zhong, Xin, Johannes Hirte, linux-btrfs@vger.kernel.org Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: > Some clarification on my previous message... > > After looking at my ftrace log more closely, I can see where Btrfs is > trying to release the allocated pages. However, the calculation for > the number of dirty_pages is equal to 1 when "copied == 0". > > So I'm seeing at least two problems: > (1) It keeps looping when "copied == 0". > (2) One dirty page is not being released on every loop even though > "copied == 0" (at least this problem keeps it from being an infinite > loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 7084140..89a6a26 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -54,6 +54,13 @@ static noinline int btrfs_copy_from_user(loff_t pos, int num_pages, int offset = pos & (PAGE_CACHE_SIZE - 1); int total_copied = 0; + if ((jiffies % 10) == 0) + return 0; + + if ((jiffies % 25) == 0) { + write_bytes /= 2; + } + while (write_bytes > 0) { size_t count = min_t(size_t, PAGE_CACHE_SIZE - offset, write_bytes); @@ -763,6 +770,26 @@ out: } /* + * on error we return an unlocked page and the error value + * on success we return a locked page and 0 + */ +static int prepare_uptodate_page(struct page *page, u64 pos) +{ + int ret = 0; + if ((pos & (PAGE_CACHE_SIZE - 1)) && !PageUptodate(page)) { + ret = btrfs_readpage(NULL, page); + if (ret) + return ret; + lock_page(page); + if (!PageUptodate(page)) { + unlock_page(page); + return -EIO; + } + } + return 0; +} + +/* * this gets pages into the page cache and locks them down, it also properly * waits for data=ordered extents to finish before allowing the pages to be * modified. @@ -777,6 +804,7 @@ static noinline int prepare_pages(struct btrfs_root *root, struct file *file, unsigned long index = pos >> PAGE_CACHE_SHIFT; struct inode *inode = fdentry(file)->d_inode; int err = 0; + int faili = 0; u64 start_pos; u64 last_pos; @@ -794,15 +822,24 @@ again: for (i = 0; i < num_pages; i++) { pages[i] = grab_cache_page(inode->i_mapping, index + i); if (!pages[i]) { - int c; - for (c = i - 1; c >= 0; c--) { - unlock_page(pages[c]); - page_cache_release(pages[c]); - } - return -ENOMEM; + faili = i - 1; + err = -ENOMEM; + goto fail; + } + + if (i == 0) + err = prepare_uptodate_page(pages[i], pos); + else if (i == num_pages - 1) + err = prepare_uptodate_page(pages[i], + pos + write_bytes); + if (err) { + page_cache_release(pages[i]); + faili = i - 1; + goto fail; } wait_on_page_writeback(pages[i]); } + err = 0; if (start_pos < inode->i_size) { struct btrfs_ordered_extent *ordered; lock_extent_bits(&BTRFS_I(inode)->io_tree, @@ -842,6 +879,14 @@ again: WARN_ON(!PageLocked(pages[i])); } return 0; +fail: + while (faili >= 0) { + unlock_page(pages[faili]); + page_cache_release(pages[faili]); + faili--; + } + return err; + } static ssize_t btrfs_file_aio_write(struct kiocb *iocb, @@ -851,7 +896,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, struct file *file = iocb->ki_filp; struct inode *inode = fdentry(file)->d_inode; struct btrfs_root *root = BTRFS_I(inode)->root; - struct page *pinned[2]; struct page **pages = NULL; struct iov_iter i; loff_t *ppos = &iocb->ki_pos; @@ -872,9 +916,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, will_write = ((file->f_flags & O_DSYNC) || IS_SYNC(inode) || (file->f_flags & O_DIRECT)); - pinned[0] = NULL; - pinned[1] = NULL; - start_pos = pos; vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE); @@ -962,32 +1003,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, first_index = pos >> PAGE_CACHE_SHIFT; last_index = (pos + iov_iter_count(&i)) >> PAGE_CACHE_SHIFT; - /* - * there are lots of better ways to do this, but this code - * makes sure the first and last page in the file range are - * up to date and ready for cow - */ - if ((pos & (PAGE_CACHE_SIZE - 1))) { - pinned[0] = grab_cache_page(inode->i_mapping, first_index); - if (!PageUptodate(pinned[0])) { - ret = btrfs_readpage(NULL, pinned[0]); - BUG_ON(ret); - wait_on_page_locked(pinned[0]); - } else { - unlock_page(pinned[0]); - } - } - if ((pos + iov_iter_count(&i)) & (PAGE_CACHE_SIZE - 1)) { - pinned[1] = grab_cache_page(inode->i_mapping, last_index); - if (!PageUptodate(pinned[1])) { - ret = btrfs_readpage(NULL, pinned[1]); - BUG_ON(ret); - wait_on_page_locked(pinned[1]); - } else { - unlock_page(pinned[1]); - } - } - while (iov_iter_count(&i) > 0) { size_t offset = pos & (PAGE_CACHE_SIZE - 1); size_t write_bytes = min(iov_iter_count(&i), @@ -1024,8 +1039,12 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, &i); - dirty_pages = (copied + offset + PAGE_CACHE_SIZE - 1) >> - PAGE_CACHE_SHIFT; + if (copied == 0) + dirty_pages = 0; + else + dirty_pages = (copied + offset + + PAGE_CACHE_SIZE - 1) >> + PAGE_CACHE_SHIFT; if (num_pages > dirty_pages) { if (copied > 0) @@ -1069,10 +1088,6 @@ out: err = ret; kfree(pages); - if (pinned[0]) - page_cache_release(pinned[0]); - if (pinned[1]) - page_cache_release(pinned[1]); *ppos = pos; /* ^ permalink raw reply related [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-02-28 1:46 ` [PATCH] btrfs file write debugging patch Chris Mason @ 2011-02-28 8:56 ` Zhong, Xin 2011-02-28 14:02 ` Chris Mason 2011-02-28 10:13 ` Johannes Hirte 1 sibling, 1 reply; 40+ messages in thread From: Zhong, Xin @ 2011-02-28 8:56 UTC (permalink / raw) To: Chris Mason, Mitch Harder Cc: Maria Wikström, Johannes Hirte, linux-btrfs@vger.kernel.org T25lIHBvc3NpYmxlIGlzc3VlIEkgY2FuIHNlZSBpcyBpbiB0aGUgcmFuZG9tIGZhaWx1cmUgY2Fz ZSAjMiB0aGF0IGNvcHlfZnJvbV91c2VyIG9ubHkgcHJvY2VzcyBoYWxmIG9mIHRoZSBkYXRhLiAN Cg0KRm9yIGV4YW1wbGUsIGlmIGl0IHdyaXRlIGEgNGsgYWxpZ25lZCBwYWdlIGFuZCBjb3B5X2Zy b21fdXNlciBvbmx5IHdyaXRlIDJrLiBUaGVuIGl0IHdpbGwgbm90IGNhbGwgYnRyZnNfZGVsYWxs b2NfcmVsZWFzZV9zcGFjZSBzaW5jZSBudW1fcGFnZXMgYW5kIGRpcnR5X3BhZ2VzIGFyZSBib3Ro IDEuIA0KSW4gdGhlIG5leHQgcm91bmQsIGl0IHdyaXRlIGFub3RoZXIgMmsgYW5kIGJ0cmZzX2Rl bGFsbG9jX3Jlc2VydmVfc3BhY2UgaXMgY2FsbGVkIHR3aWNlIGZvciB0aGUgc2FtZSBwYWdlLiAN Cg0KSXMgaXQgYSBwcm9ibGVtPyBUaGFua3MhDQoNCi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0t DQpGcm9tOiBDaHJpcyBNYXNvbiBbbWFpbHRvOmNocmlzLm1hc29uQG9yYWNsZS5jb21dIA0KU2Vu dDogTW9uZGF5LCBGZWJydWFyeSAyOCwgMjAxMSA5OjQ2IEFNDQpUbzogTWl0Y2ggSGFyZGVyDQpD YzogTWFyaWEgV2lrc3Ryw7ZtOyBaaG9uZywgWGluOyBKb2hhbm5lcyBIaXJ0ZTsgbGludXgtYnRy ZnNAdmdlci5rZXJuZWwub3JnDQpTdWJqZWN0OiBbUEFUQ0hdIGJ0cmZzIGZpbGUgd3JpdGUgZGVi dWdnaW5nIHBhdGNoDQoNCkV4Y2VycHRzIGZyb20gTWl0Y2ggSGFyZGVyJ3MgbWVzc2FnZSBvZiAy MDExLTAyLTI1IDEzOjQzOjM3IC0wNTAwOg0KPiBTb21lIGNsYXJpZmljYXRpb24gb24gbXkgcHJl dmlvdXMgbWVzc2FnZS4uLg0KPiANCj4gQWZ0ZXIgbG9va2luZyBhdCBteSBmdHJhY2UgbG9nIG1v cmUgY2xvc2VseSwgSSBjYW4gc2VlIHdoZXJlIEJ0cmZzIGlzDQo+IHRyeWluZyB0byByZWxlYXNl IHRoZSBhbGxvY2F0ZWQgcGFnZXMuICBIb3dldmVyLCB0aGUgY2FsY3VsYXRpb24gZm9yDQo+IHRo ZSBudW1iZXIgb2YgZGlydHlfcGFnZXMgaXMgZXF1YWwgdG8gMSB3aGVuICJjb3BpZWQgPT0gMCIu DQo+IA0KPiBTbyBJJ20gc2VlaW5nIGF0IGxlYXN0IHR3byBwcm9ibGVtczoNCj4gKDEpICBJdCBr ZWVwcyBsb29waW5nIHdoZW4gImNvcGllZCA9PSAwIi4NCj4gKDIpICBPbmUgZGlydHkgcGFnZSBp cyBub3QgYmVpbmcgcmVsZWFzZWQgb24gZXZlcnkgbG9vcCBldmVuIHRob3VnaA0KPiAiY29waWVk ID09IDAiIChhdCBsZWFzdCB0aGlzIHByb2JsZW0ga2VlcHMgaXQgZnJvbSBiZWluZyBhbiBpbmZp bml0ZQ0KPiBsb29wIGJ5IGV2ZW50dWFsbHkgZXhoYXVzdGluZyByZXNlcnZlYWJsZSBzcGFjZSBv biB0aGUgZGlzaykuDQoNCkhpIGV2ZXJ5b25lLA0KDQpUaGVyZSBhcmUgYWN0dWFsbHkgdG93IGJ1 Z3MgaGVyZS4gIEZpcnN0IHRoZSBvbmUgdGhhdCBNaXRjaCBoaXQsIGFuZCBhDQpzZWNvbmQgb25l IHRoYXQgc3RpbGwgcmVzdWx0cyBpbiBiYWQgZmlsZV93cml0ZSByZXN1bHRzIHdpdGggbXkNCmRl YnVnZ2luZyBodW5rcyAodGhlIGZpcnN0IHR3byBodW5rcyBiZWxvdykgaW4gcGxhY2UuDQoNCk15 IHBhdGNoIGZpeGVzIE1pdGNoJ3MgYnVnIGJ5IGNoZWNraW5nIGZvciBjb3BpZWQgPT0gMCBhZnRl cg0KYnRyZnNfY29weV9mcm9tX3VzZXIgYW5kIGdvaW5nIHRoZSBjb3JyZWN0IGRlbGFsbG9jIGFj Y291bnRpbmcuICBUaGlzDQpvbmUgbG9va3Mgc29sdmVkLCBidXQgeW91J2xsIG5vdGljZSB0aGUg cGF0Y2ggaXMgYmlnZ2VyLg0KDQpGaXJzdCwgSSBhZGQgc29tZSByYW5kb20gZmFpbHVyZXMgdG8g YnRyZnNfY29weV9mcm9tX3VzZXIoKSBieSBmYWlsaW5nDQpldmVyeW9uZSBvbmNlIGFuZCBhIHdo aWxlLiAgVGhpcyB3YXMgbXVjaCBtb3JlIHJlbGlhYmxlIHRoYW4gdHJ5aW5nIHRvDQp1c2UgbWVt b3J5IHByZXNzdXJlIHRoYW4gbWFraW5nIGNvcHlfZnJvbV91c2VyIGZhaWwuDQoNCklmIGNvcHlf ZnJvbV91c2VyIGZhaWxzIGFuZCB3ZSBwYXJ0aWFsbHkgdXBkYXRlIGEgcGFnZSwgd2UgZW5kIHVw IHdpdGggYQ0KcGFnZSB0aGF0IG1heSBnbyBhd2F5IGR1ZSB0byBtZW1vcnkgcHJlc3N1cmUuICBC dXQsIGJ0cmZzX2ZpbGVfd3JpdGUNCmFzc3VtZXMgdGhhdCBvbmx5IHRoZSBmaXJzdCBhbmQgbGFz dCBwYWdlIG1heSBoYXZlIGdvb2QgZGF0YSB0aGF0IG5lZWRzDQp0byBiZSByZWFkIG9mZiB0aGUg ZGlzay4NCg0KVGhpcyBwYXRjaCBkaXRjaGVzIHRoYXQgY29kZSBhbmQgcHV0cyBpdCBpbnRvIHBy ZXBhcmVfcGFnZXMgaW5zdGVhZC4NCkJ1dCBJJ20gc3RpbGwgaGF2aW5nIHNvbWUgZXJyb3JzIGR1 cmluZyBsb25nIHN0cmVzcy5zaCBydW5zLiAgSWRlYXMgYXJlDQptb3JlIHRoYW4gd2VsY29tZSwg aG9wZWZ1bGx5IHNvbWUgb3RoZXIgdGltZXpvbmVzIHdpbGwga2ljayBpbiBpZGVhcw0Kd2hpbGUg SSBzbGVlcC4NCg0KZGlmZiAtLWdpdCBhL2ZzL2J0cmZzL2ZpbGUuYyBiL2ZzL2J0cmZzL2ZpbGUu Yw0KaW5kZXggNzA4NDE0MC4uODlhNmEyNiAxMDA2NDQNCi0tLSBhL2ZzL2J0cmZzL2ZpbGUuYw0K KysrIGIvZnMvYnRyZnMvZmlsZS5jDQpAQCAtNTQsNiArNTQsMTMgQEAgc3RhdGljIG5vaW5saW5l IGludCBidHJmc19jb3B5X2Zyb21fdXNlcihsb2ZmX3QgcG9zLCBpbnQgbnVtX3BhZ2VzLA0KIAlp bnQgb2Zmc2V0ID0gcG9zICYgKFBBR0VfQ0FDSEVfU0laRSAtIDEpOw0KIAlpbnQgdG90YWxfY29w aWVkID0gMDsNCiANCisJaWYgKChqaWZmaWVzICUgMTApID09IDApDQorCQlyZXR1cm4gMDsNCisN CisJaWYgKChqaWZmaWVzICUgMjUpID09IDApIHsNCisJCXdyaXRlX2J5dGVzIC89IDI7DQorCX0N CisNCiAJd2hpbGUgKHdyaXRlX2J5dGVzID4gMCkgew0KIAkJc2l6ZV90IGNvdW50ID0gbWluX3Qo c2l6ZV90LA0KIAkJCQkgICAgIFBBR0VfQ0FDSEVfU0laRSAtIG9mZnNldCwgd3JpdGVfYnl0ZXMp Ow0KQEAgLTc2Myw2ICs3NzAsMjYgQEAgb3V0Og0KIH0NCiANCiAvKg0KKyAqIG9uIGVycm9yIHdl IHJldHVybiBhbiB1bmxvY2tlZCBwYWdlIGFuZCB0aGUgZXJyb3IgdmFsdWUNCisgKiBvbiBzdWNj ZXNzIHdlIHJldHVybiBhIGxvY2tlZCBwYWdlIGFuZCAwDQorICovDQorc3RhdGljIGludCBwcmVw YXJlX3VwdG9kYXRlX3BhZ2Uoc3RydWN0IHBhZ2UgKnBhZ2UsIHU2NCBwb3MpDQorew0KKwlpbnQg cmV0ID0gMDsNCisJaWYgKChwb3MgJiAoUEFHRV9DQUNIRV9TSVpFIC0gMSkpICYmICFQYWdlVXB0 b2RhdGUocGFnZSkpIHsNCisJCXJldCA9IGJ0cmZzX3JlYWRwYWdlKE5VTEwsIHBhZ2UpOw0KKwkJ aWYgKHJldCkNCisJCQlyZXR1cm4gcmV0Ow0KKwkJbG9ja19wYWdlKHBhZ2UpOw0KKwkJaWYgKCFQ YWdlVXB0b2RhdGUocGFnZSkpIHsNCisJCQl1bmxvY2tfcGFnZShwYWdlKTsNCisJCQlyZXR1cm4g LUVJTzsNCisJCX0NCisJfQ0KKwlyZXR1cm4gMDsNCit9DQorDQorLyoNCiAgKiB0aGlzIGdldHMg cGFnZXMgaW50byB0aGUgcGFnZSBjYWNoZSBhbmQgbG9ja3MgdGhlbSBkb3duLCBpdCBhbHNvIHBy b3Blcmx5DQogICogd2FpdHMgZm9yIGRhdGE9b3JkZXJlZCBleHRlbnRzIHRvIGZpbmlzaCBiZWZv cmUgYWxsb3dpbmcgdGhlIHBhZ2VzIHRvIGJlDQogICogbW9kaWZpZWQuDQpAQCAtNzc3LDYgKzgw NCw3IEBAIHN0YXRpYyBub2lubGluZSBpbnQgcHJlcGFyZV9wYWdlcyhzdHJ1Y3QgYnRyZnNfcm9v dCAqcm9vdCwgc3RydWN0IGZpbGUgKmZpbGUsDQogCXVuc2lnbmVkIGxvbmcgaW5kZXggPSBwb3Mg Pj4gUEFHRV9DQUNIRV9TSElGVDsNCiAJc3RydWN0IGlub2RlICppbm9kZSA9IGZkZW50cnkoZmls ZSktPmRfaW5vZGU7DQogCWludCBlcnIgPSAwOw0KKwlpbnQgZmFpbGkgPSAwOw0KIAl1NjQgc3Rh cnRfcG9zOw0KIAl1NjQgbGFzdF9wb3M7DQogDQpAQCAtNzk0LDE1ICs4MjIsMjQgQEAgYWdhaW46 DQogCWZvciAoaSA9IDA7IGkgPCBudW1fcGFnZXM7IGkrKykgew0KIAkJcGFnZXNbaV0gPSBncmFi X2NhY2hlX3BhZ2UoaW5vZGUtPmlfbWFwcGluZywgaW5kZXggKyBpKTsNCiAJCWlmICghcGFnZXNb aV0pIHsNCi0JCQlpbnQgYzsNCi0JCQlmb3IgKGMgPSBpIC0gMTsgYyA+PSAwOyBjLS0pIHsNCi0J CQkJdW5sb2NrX3BhZ2UocGFnZXNbY10pOw0KLQkJCQlwYWdlX2NhY2hlX3JlbGVhc2UocGFnZXNb Y10pOw0KLQkJCX0NCi0JCQlyZXR1cm4gLUVOT01FTTsNCisJCQlmYWlsaSA9IGkgLSAxOw0KKwkJ CWVyciA9IC1FTk9NRU07DQorCQkJZ290byBmYWlsOw0KKwkJfQ0KKw0KKwkJaWYgKGkgPT0gMCkN CisJCQllcnIgPSBwcmVwYXJlX3VwdG9kYXRlX3BhZ2UocGFnZXNbaV0sIHBvcyk7DQorCQllbHNl IGlmIChpID09IG51bV9wYWdlcyAtIDEpDQorCQkJZXJyID0gcHJlcGFyZV91cHRvZGF0ZV9wYWdl KHBhZ2VzW2ldLA0KKwkJCQkJCSAgICBwb3MgKyB3cml0ZV9ieXRlcyk7DQorCQlpZiAoZXJyKSB7 DQorCQkJcGFnZV9jYWNoZV9yZWxlYXNlKHBhZ2VzW2ldKTsNCisJCQlmYWlsaSA9IGkgLSAxOw0K KwkJCWdvdG8gZmFpbDsNCiAJCX0NCiAJCXdhaXRfb25fcGFnZV93cml0ZWJhY2socGFnZXNbaV0p Ow0KIAl9DQorCWVyciA9IDA7DQogCWlmIChzdGFydF9wb3MgPCBpbm9kZS0+aV9zaXplKSB7DQog CQlzdHJ1Y3QgYnRyZnNfb3JkZXJlZF9leHRlbnQgKm9yZGVyZWQ7DQogCQlsb2NrX2V4dGVudF9i aXRzKCZCVFJGU19JKGlub2RlKS0+aW9fdHJlZSwNCkBAIC04NDIsNiArODc5LDE0IEBAIGFnYWlu Og0KIAkJV0FSTl9PTighUGFnZUxvY2tlZChwYWdlc1tpXSkpOw0KIAl9DQogCXJldHVybiAwOw0K K2ZhaWw6DQorCXdoaWxlIChmYWlsaSA+PSAwKSB7DQorCQl1bmxvY2tfcGFnZShwYWdlc1tmYWls aV0pOw0KKwkJcGFnZV9jYWNoZV9yZWxlYXNlKHBhZ2VzW2ZhaWxpXSk7DQorCQlmYWlsaS0tOw0K Kwl9DQorCXJldHVybiBlcnI7DQorDQogfQ0KIA0KIHN0YXRpYyBzc2l6ZV90IGJ0cmZzX2ZpbGVf YWlvX3dyaXRlKHN0cnVjdCBraW9jYiAqaW9jYiwNCkBAIC04NTEsNyArODk2LDYgQEAgc3RhdGlj IHNzaXplX3QgYnRyZnNfZmlsZV9haW9fd3JpdGUoc3RydWN0IGtpb2NiICppb2NiLA0KIAlzdHJ1 Y3QgZmlsZSAqZmlsZSA9IGlvY2ItPmtpX2ZpbHA7DQogCXN0cnVjdCBpbm9kZSAqaW5vZGUgPSBm ZGVudHJ5KGZpbGUpLT5kX2lub2RlOw0KIAlzdHJ1Y3QgYnRyZnNfcm9vdCAqcm9vdCA9IEJUUkZT X0koaW5vZGUpLT5yb290Ow0KLQlzdHJ1Y3QgcGFnZSAqcGlubmVkWzJdOw0KIAlzdHJ1Y3QgcGFn ZSAqKnBhZ2VzID0gTlVMTDsNCiAJc3RydWN0IGlvdl9pdGVyIGk7DQogCWxvZmZfdCAqcHBvcyA9 ICZpb2NiLT5raV9wb3M7DQpAQCAtODcyLDkgKzkxNiw2IEBAIHN0YXRpYyBzc2l6ZV90IGJ0cmZz X2ZpbGVfYWlvX3dyaXRlKHN0cnVjdCBraW9jYiAqaW9jYiwNCiAJd2lsbF93cml0ZSA9ICgoZmls ZS0+Zl9mbGFncyAmIE9fRFNZTkMpIHx8IElTX1NZTkMoaW5vZGUpIHx8DQogCQkgICAgICAoZmls ZS0+Zl9mbGFncyAmIE9fRElSRUNUKSk7DQogDQotCXBpbm5lZFswXSA9IE5VTEw7DQotCXBpbm5l ZFsxXSA9IE5VTEw7DQotDQogCXN0YXJ0X3BvcyA9IHBvczsNCiANCiAJdmZzX2NoZWNrX2Zyb3pl bihpbm9kZS0+aV9zYiwgU0JfRlJFRVpFX1dSSVRFKTsNCkBAIC05NjIsMzIgKzEwMDMsNiBAQCBz dGF0aWMgc3NpemVfdCBidHJmc19maWxlX2Fpb193cml0ZShzdHJ1Y3Qga2lvY2IgKmlvY2IsDQog CWZpcnN0X2luZGV4ID0gcG9zID4+IFBBR0VfQ0FDSEVfU0hJRlQ7DQogCWxhc3RfaW5kZXggPSAo cG9zICsgaW92X2l0ZXJfY291bnQoJmkpKSA+PiBQQUdFX0NBQ0hFX1NISUZUOw0KIA0KLQkvKg0K LQkgKiB0aGVyZSBhcmUgbG90cyBvZiBiZXR0ZXIgd2F5cyB0byBkbyB0aGlzLCBidXQgdGhpcyBj b2RlDQotCSAqIG1ha2VzIHN1cmUgdGhlIGZpcnN0IGFuZCBsYXN0IHBhZ2UgaW4gdGhlIGZpbGUg cmFuZ2UgYXJlDQotCSAqIHVwIHRvIGRhdGUgYW5kIHJlYWR5IGZvciBjb3cNCi0JICovDQotCWlm ICgocG9zICYgKFBBR0VfQ0FDSEVfU0laRSAtIDEpKSkgew0KLQkJcGlubmVkWzBdID0gZ3JhYl9j YWNoZV9wYWdlKGlub2RlLT5pX21hcHBpbmcsIGZpcnN0X2luZGV4KTsNCi0JCWlmICghUGFnZVVw dG9kYXRlKHBpbm5lZFswXSkpIHsNCi0JCQlyZXQgPSBidHJmc19yZWFkcGFnZShOVUxMLCBwaW5u ZWRbMF0pOw0KLQkJCUJVR19PTihyZXQpOw0KLQkJCXdhaXRfb25fcGFnZV9sb2NrZWQocGlubmVk WzBdKTsNCi0JCX0gZWxzZSB7DQotCQkJdW5sb2NrX3BhZ2UocGlubmVkWzBdKTsNCi0JCX0NCi0J fQ0KLQlpZiAoKHBvcyArIGlvdl9pdGVyX2NvdW50KCZpKSkgJiAoUEFHRV9DQUNIRV9TSVpFIC0g MSkpIHsNCi0JCXBpbm5lZFsxXSA9IGdyYWJfY2FjaGVfcGFnZShpbm9kZS0+aV9tYXBwaW5nLCBs YXN0X2luZGV4KTsNCi0JCWlmICghUGFnZVVwdG9kYXRlKHBpbm5lZFsxXSkpIHsNCi0JCQlyZXQg PSBidHJmc19yZWFkcGFnZShOVUxMLCBwaW5uZWRbMV0pOw0KLQkJCUJVR19PTihyZXQpOw0KLQkJ CXdhaXRfb25fcGFnZV9sb2NrZWQocGlubmVkWzFdKTsNCi0JCX0gZWxzZSB7DQotCQkJdW5sb2Nr X3BhZ2UocGlubmVkWzFdKTsNCi0JCX0NCi0JfQ0KLQ0KIAl3aGlsZSAoaW92X2l0ZXJfY291bnQo JmkpID4gMCkgew0KIAkJc2l6ZV90IG9mZnNldCA9IHBvcyAmIChQQUdFX0NBQ0hFX1NJWkUgLSAx KTsNCiAJCXNpemVfdCB3cml0ZV9ieXRlcyA9IG1pbihpb3ZfaXRlcl9jb3VudCgmaSksDQpAQCAt MTAyNCw4ICsxMDM5LDEyIEBAIHN0YXRpYyBzc2l6ZV90IGJ0cmZzX2ZpbGVfYWlvX3dyaXRlKHN0 cnVjdCBraW9jYiAqaW9jYiwNCiANCiAJCWNvcGllZCA9IGJ0cmZzX2NvcHlfZnJvbV91c2VyKHBv cywgbnVtX3BhZ2VzLA0KIAkJCQkJICAgd3JpdGVfYnl0ZXMsIHBhZ2VzLCAmaSk7DQotCQlkaXJ0 eV9wYWdlcyA9IChjb3BpZWQgKyBvZmZzZXQgKyBQQUdFX0NBQ0hFX1NJWkUgLSAxKSA+Pg0KLQkJ CQlQQUdFX0NBQ0hFX1NISUZUOw0KKwkJaWYgKGNvcGllZCA9PSAwKQ0KKwkJCWRpcnR5X3BhZ2Vz ID0gMDsNCisJCWVsc2UNCisJCQlkaXJ0eV9wYWdlcyA9IChjb3BpZWQgKyBvZmZzZXQgKw0KKwkJ CQkgICAgICAgUEFHRV9DQUNIRV9TSVpFIC0gMSkgPj4NCisJCQkJICAgICAgIFBBR0VfQ0FDSEVf U0hJRlQ7DQogDQogCQlpZiAobnVtX3BhZ2VzID4gZGlydHlfcGFnZXMpIHsNCiAJCQlpZiAoY29w aWVkID4gMCkNCkBAIC0xMDY5LDEwICsxMDg4LDYgQEAgb3V0Og0KIAkJZXJyID0gcmV0Ow0KIA0K IAlrZnJlZShwYWdlcyk7DQotCWlmIChwaW5uZWRbMF0pDQotCQlwYWdlX2NhY2hlX3JlbGVhc2Uo cGlubmVkWzBdKTsNCi0JaWYgKHBpbm5lZFsxXSkNCi0JCXBhZ2VfY2FjaGVfcmVsZWFzZShwaW5u ZWRbMV0pOw0KIAkqcHBvcyA9IHBvczsNCiANCiAJLyoNCg== ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-02-28 8:56 ` Zhong, Xin @ 2011-02-28 14:02 ` Chris Mason 0 siblings, 0 replies; 40+ messages in thread From: Chris Mason @ 2011-02-28 14:02 UTC (permalink / raw) To: Zhong, Xin Cc: Mitch Harder, Maria Wikström, Johannes Hirte, linux-btrfs@vger.kernel.org Excerpts from Zhong, Xin's message of 2011-02-28 03:56:40 -0500: > One possible issue I can see is in the random failure case #2 that co= py_from_user only process half of the data.=20 >=20 > For example, if it write a 4k aligned page and copy_from_user only wr= ite 2k. Then it will not call btrfs_delalloc_release_space since num_pa= ges and dirty_pages are both 1.=20 > In the next round, it write another 2k and btrfs_delalloc_reserve_spa= ce is called twice for the same page.=20 >=20 > Is it a problem? Thanks! It should be the correct answer. The result of the short copy_from_use= r should be exactly the same as two write calls where one does 2K and the other does another 2K. Either way, it shouldn't result in incorrect bytes in the file, which i= s still happening for me with the debugging hunks in place. -chris >=20 > -----Original Message----- > From: Chris Mason [mailto:chris.mason@oracle.com]=20 > Sent: Monday, February 28, 2011 9:46 AM > To: Mitch Harder > Cc: Maria Wikstr=C3=B6m; Zhong, Xin; Johannes Hirte; linux-btrfs@vger= =2Ekernel.org > Subject: [PATCH] btrfs file write debugging patch >=20 > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: > > Some clarification on my previous message... > >=20 > > After looking at my ftrace log more closely, I can see where Btrfs = is > > trying to release the allocated pages. However, the calculation fo= r > > the number of dirty_pages is equal to 1 when "copied =3D=3D 0". > >=20 > > So I'm seeing at least two problems: > > (1) It keeps looping when "copied =3D=3D 0". > > (2) One dirty page is not being released on every loop even though > > "copied =3D=3D 0" (at least this problem keeps it from being an inf= inite > > loop by eventually exhausting reserveable space on the disk). >=20 > Hi everyone, >=20 > There are actually tow bugs here. First the one that Mitch hit, and = a > second one that still results in bad file_write results with my > debugging hunks (the first two hunks below) in place. >=20 > My patch fixes Mitch's bug by checking for copied =3D=3D 0 after > btrfs_copy_from_user and going the correct delalloc accounting. This > one looks solved, but you'll notice the patch is bigger. >=20 > First, I add some random failures to btrfs_copy_from_user() by failin= g > everyone once and a while. This was much more reliable than trying t= o > use memory pressure than making copy_from_user fail. >=20 > If copy_from_user fails and we partially update a page, we end up wit= h a > page that may go away due to memory pressure. But, btrfs_file_write > assumes that only the first and last page may have good data that nee= ds > to be read off the disk. >=20 > This patch ditches that code and puts it into prepare_pages instead. > But I'm still having some errors during long stress.sh runs. Ideas a= re > more than welcome, hopefully some other timezones will kick in ideas > while I sleep. >=20 > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 7084140..89a6a26 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -54,6 +54,13 @@ static noinline int btrfs_copy_from_user(loff_t po= s, int num_pages, > int offset =3D pos & (PAGE_CACHE_SIZE - 1); > int total_copied =3D 0; > =20 > + if ((jiffies % 10) =3D=3D 0) > + return 0; > + > + if ((jiffies % 25) =3D=3D 0) { > + write_bytes /=3D 2; > + } > + > while (write_bytes > 0) { > size_t count =3D min_t(size_t, > PAGE_CACHE_SIZE - offset, write_bytes); > @@ -763,6 +770,26 @@ out: > } > =20 > /* > + * on error we return an unlocked page and the error value > + * on success we return a locked page and 0 > + */ > +static int prepare_uptodate_page(struct page *page, u64 pos) > +{ > + int ret =3D 0; > + if ((pos & (PAGE_CACHE_SIZE - 1)) && !PageUptodate(page)) { > + ret =3D btrfs_readpage(NULL, page); > + if (ret) > + return ret; > + lock_page(page); > + if (!PageUptodate(page)) { > + unlock_page(page); > + return -EIO; > + } > + } > + return 0; > +} > + > +/* > * this gets pages into the page cache and locks them down, it also = properly > * waits for data=3Dordered extents to finish before allowing the pa= ges to be > * modified. > @@ -777,6 +804,7 @@ static noinline int prepare_pages(struct btrfs_ro= ot *root, struct file *file, > unsigned long index =3D pos >> PAGE_CACHE_SHIFT; > struct inode *inode =3D fdentry(file)->d_inode; > int err =3D 0; > + int faili =3D 0; > u64 start_pos; > u64 last_pos; > =20 > @@ -794,15 +822,24 @@ again: > for (i =3D 0; i < num_pages; i++) { > pages[i] =3D grab_cache_page(inode->i_mapping, index + i); > if (!pages[i]) { > - int c; > - for (c =3D i - 1; c >=3D 0; c--) { > - unlock_page(pages[c]); > - page_cache_release(pages[c]); > - } > - return -ENOMEM; > + faili =3D i - 1; > + err =3D -ENOMEM; > + goto fail; > + } > + > + if (i =3D=3D 0) > + err =3D prepare_uptodate_page(pages[i], pos); > + else if (i =3D=3D num_pages - 1) > + err =3D prepare_uptodate_page(pages[i], > + pos + write_bytes); > + if (err) { > + page_cache_release(pages[i]); > + faili =3D i - 1; > + goto fail; > } > wait_on_page_writeback(pages[i]); > } > + err =3D 0; > if (start_pos < inode->i_size) { > struct btrfs_ordered_extent *ordered; > lock_extent_bits(&BTRFS_I(inode)->io_tree, > @@ -842,6 +879,14 @@ again: > WARN_ON(!PageLocked(pages[i])); > } > return 0; > +fail: > + while (faili >=3D 0) { > + unlock_page(pages[faili]); > + page_cache_release(pages[faili]); > + faili--; > + } > + return err; > + > } > =20 > static ssize_t btrfs_file_aio_write(struct kiocb *iocb, > @@ -851,7 +896,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb = *iocb, > struct file *file =3D iocb->ki_filp; > struct inode *inode =3D fdentry(file)->d_inode; > struct btrfs_root *root =3D BTRFS_I(inode)->root; > - struct page *pinned[2]; > struct page **pages =3D NULL; > struct iov_iter i; > loff_t *ppos =3D &iocb->ki_pos; > @@ -872,9 +916,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb = *iocb, > will_write =3D ((file->f_flags & O_DSYNC) || IS_SYNC(inode) || > (file->f_flags & O_DIRECT)); > =20 > - pinned[0] =3D NULL; > - pinned[1] =3D NULL; > - > start_pos =3D pos; > =20 > vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE); > @@ -962,32 +1003,6 @@ static ssize_t btrfs_file_aio_write(struct kioc= b *iocb, > first_index =3D pos >> PAGE_CACHE_SHIFT; > last_index =3D (pos + iov_iter_count(&i)) >> PAGE_CACHE_SHIFT; > =20 > - /* > - * there are lots of better ways to do this, but this code > - * makes sure the first and last page in the file range are > - * up to date and ready for cow > - */ > - if ((pos & (PAGE_CACHE_SIZE - 1))) { > - pinned[0] =3D grab_cache_page(inode->i_mapping, first_index)= ; > - if (!PageUptodate(pinned[0])) { > - ret =3D btrfs_readpage(NULL, pinned[0]); > - BUG_ON(ret); > - wait_on_page_locked(pinned[0]); > - } else { > - unlock_page(pinned[0]); > - } > - } > - if ((pos + iov_iter_count(&i)) & (PAGE_CACHE_SIZE - 1)) { > - pinned[1] =3D grab_cache_page(inode->i_mapping, last_index); > - if (!PageUptodate(pinned[1])) { > - ret =3D btrfs_readpage(NULL, pinned[1]); > - BUG_ON(ret); > - wait_on_page_locked(pinned[1]); > - } else { > - unlock_page(pinned[1]); > - } > - } > - > while (iov_iter_count(&i) > 0) { > size_t offset =3D pos & (PAGE_CACHE_SIZE - 1); > size_t write_bytes =3D min(iov_iter_count(&i), > @@ -1024,8 +1039,12 @@ static ssize_t btrfs_file_aio_write(struct kio= cb *iocb, > =20 > copied =3D btrfs_copy_from_user(pos, num_pages, > write_bytes, pages, &i); > - dirty_pages =3D (copied + offset + PAGE_CACHE_SIZE - 1) >> > - PAGE_CACHE_SHIFT; > + if (copied =3D=3D 0) > + dirty_pages =3D 0; > + else > + dirty_pages =3D (copied + offset + > + PAGE_CACHE_SIZE - 1) >> > + PAGE_CACHE_SHIFT; > =20 > if (num_pages > dirty_pages) { > if (copied > 0) > @@ -1069,10 +1088,6 @@ out: > err =3D ret; > =20 > kfree(pages); > - if (pinned[0]) > - page_cache_release(pinned[0]); > - if (pinned[1]) > - page_cache_release(pinned[1]); > *ppos =3D pos; > =20 > /* -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-02-28 1:46 ` [PATCH] btrfs file write debugging patch Chris Mason 2011-02-28 8:56 ` Zhong, Xin @ 2011-02-28 10:13 ` Johannes Hirte 2011-02-28 14:00 ` Chris Mason 2011-02-28 16:10 ` Josef Bacik 1 sibling, 2 replies; 40+ messages in thread From: Johannes Hirte @ 2011-02-28 10:13 UTC (permalink / raw) To: Chris Mason Cc: Mitch Harder, Maria Wikström, Zhong, Xin, linux-btrfs@vger.kernel.org On Monday 28 February 2011 02:46:05 Chris Mason wrote: > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: > > Some clarification on my previous message... > > > > After looking at my ftrace log more closely, I can see where Btrfs is > > trying to release the allocated pages. However, the calculation for > > the number of dirty_pages is equal to 1 when "copied == 0". > > > > So I'm seeing at least two problems: > > (1) It keeps looping when "copied == 0". > > (2) One dirty page is not being released on every loop even though > > "copied == 0" (at least this problem keeps it from being an infinite > > loop by eventually exhausting reserveable space on the disk). > > Hi everyone, > > There are actually tow bugs here. First the one that Mitch hit, and a > second one that still results in bad file_write results with my > debugging hunks (the first two hunks below) in place. > > My patch fixes Mitch's bug by checking for copied == 0 after > btrfs_copy_from_user and going the correct delalloc accounting. This > one looks solved, but you'll notice the patch is bigger. > > First, I add some random failures to btrfs_copy_from_user() by failing > everyone once and a while. This was much more reliable than trying to > use memory pressure than making copy_from_user fail. > > If copy_from_user fails and we partially update a page, we end up with a > page that may go away due to memory pressure. But, btrfs_file_write > assumes that only the first and last page may have good data that needs > to be read off the disk. > > This patch ditches that code and puts it into prepare_pages instead. > But I'm still having some errors during long stress.sh runs. Ideas are > more than welcome, hopefully some other timezones will kick in ideas > while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-02-28 10:13 ` Johannes Hirte @ 2011-02-28 14:00 ` Chris Mason 2011-02-28 16:10 ` Josef Bacik 1 sibling, 0 replies; 40+ messages in thread From: Chris Mason @ 2011-02-28 14:00 UTC (permalink / raw) To: Johannes Hirte Cc: Mitch Harder, Maria Wikström, Zhong, Xin, linux-btrfs@vger.kernel.org Excerpts from Johannes Hirte's message of 2011-02-28 05:13:59 -0500: > On Monday 28 February 2011 02:46:05 Chris Mason wrote: > > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: > > > Some clarification on my previous message... > > > > > > After looking at my ftrace log more closely, I can see where Btrfs is > > > trying to release the allocated pages. However, the calculation for > > > the number of dirty_pages is equal to 1 when "copied == 0". > > > > > > So I'm seeing at least two problems: > > > (1) It keeps looping when "copied == 0". > > > (2) One dirty page is not being released on every loop even though > > > "copied == 0" (at least this problem keeps it from being an infinite > > > loop by eventually exhausting reserveable space on the disk). > > > > Hi everyone, > > > > There are actually tow bugs here. First the one that Mitch hit, and a > > second one that still results in bad file_write results with my > > debugging hunks (the first two hunks below) in place. > > > > My patch fixes Mitch's bug by checking for copied == 0 after > > btrfs_copy_from_user and going the correct delalloc accounting. This > > one looks solved, but you'll notice the patch is bigger. > > > > First, I add some random failures to btrfs_copy_from_user() by failing > > everyone once and a while. This was much more reliable than trying to > > use memory pressure than making copy_from_user fail. > > > > If copy_from_user fails and we partially update a page, we end up with a > > page that may go away due to memory pressure. But, btrfs_file_write > > assumes that only the first and last page may have good data that needs > > to be read off the disk. > > > > This patch ditches that code and puts it into prepare_pages instead. > > But I'm still having some errors during long stress.sh runs. Ideas are > > more than welcome, hopefully some other timezones will kick in ideas > > while I sleep. > > At least it doesn't fix the emerge-problem for me. The behavior is now the same > as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no > further interaction to get the emerge-process hang with a svn-process > consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the > spawned svn-process stays and it needs a reboot to get rid of it. I think your problem really is more enospc related. Still working on that as well. But please don't try the patch without removing the debugging hunk at the top (anything that mentions jiffies). -chris ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-02-28 10:13 ` Johannes Hirte 2011-02-28 14:00 ` Chris Mason @ 2011-02-28 16:10 ` Josef Bacik 2011-02-28 16:45 ` Maria Wikström 1 sibling, 1 reply; 40+ messages in thread From: Josef Bacik @ 2011-02-28 16:10 UTC (permalink / raw) To: Johannes Hirte Cc: Chris Mason, Mitch Harder, Maria Wikström, Zhong, Xin, linux-btrfs@vger.kernel.org On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: > On Monday 28 February 2011 02:46:05 Chris Mason wrote: > > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: > > > Some clarification on my previous message... > > > > > > After looking at my ftrace log more closely, I can see where Btrfs is > > > trying to release the allocated pages. However, the calculation for > > > the number of dirty_pages is equal to 1 when "copied == 0". > > > > > > So I'm seeing at least two problems: > > > (1) It keeps looping when "copied == 0". > > > (2) One dirty page is not being released on every loop even though > > > "copied == 0" (at least this problem keeps it from being an infinite > > > loop by eventually exhausting reserveable space on the disk). > > > > Hi everyone, > > > > There are actually tow bugs here. First the one that Mitch hit, and a > > second one that still results in bad file_write results with my > > debugging hunks (the first two hunks below) in place. > > > > My patch fixes Mitch's bug by checking for copied == 0 after > > btrfs_copy_from_user and going the correct delalloc accounting. This > > one looks solved, but you'll notice the patch is bigger. > > > > First, I add some random failures to btrfs_copy_from_user() by failing > > everyone once and a while. This was much more reliable than trying to > > use memory pressure than making copy_from_user fail. > > > > If copy_from_user fails and we partially update a page, we end up with a > > page that may go away due to memory pressure. But, btrfs_file_write > > assumes that only the first and last page may have good data that needs > > to be read off the disk. > > > > This patch ditches that code and puts it into prepare_pages instead. > > But I'm still having some errors during long stress.sh runs. Ideas are > > more than welcome, hopefully some other timezones will kick in ideas > > while I sleep. > > At least it doesn't fix the emerge-problem for me. The behavior is now the same > as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no > further interaction to get the emerge-process hang with a svn-process > consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the > spawned svn-process stays and it needs a reboot to get rid of it. Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's looping? Thanks, Josef ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-02-28 16:10 ` Josef Bacik @ 2011-02-28 16:45 ` Maria Wikström 2011-02-28 17:47 ` Mitch Harder 0 siblings, 1 reply; 40+ messages in thread From: Maria Wikström @ 2011-02-28 16:45 UTC (permalink / raw) To: Josef Bacik Cc: Johannes Hirte, Chris Mason, Mitch Harder, Zhong, Xin, linux-btrfs@vger.kernel.org m=C3=A5n 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: > On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: > > On Monday 28 February 2011 02:46:05 Chris Mason wrote: > > > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500= : > > > > Some clarification on my previous message... > > > >=20 > > > > After looking at my ftrace log more closely, I can see where Bt= rfs is > > > > trying to release the allocated pages. However, the calculatio= n for > > > > the number of dirty_pages is equal to 1 when "copied =3D=3D 0". > > > >=20 > > > > So I'm seeing at least two problems: > > > > (1) It keeps looping when "copied =3D=3D 0". > > > > (2) One dirty page is not being released on every loop even th= ough > > > > "copied =3D=3D 0" (at least this problem keeps it from being an= infinite > > > > loop by eventually exhausting reserveable space on the disk). > > >=20 > > > Hi everyone, > > >=20 > > > There are actually tow bugs here. First the one that Mitch hit, = and a > > > second one that still results in bad file_write results with my > > > debugging hunks (the first two hunks below) in place. > > >=20 > > > My patch fixes Mitch's bug by checking for copied =3D=3D 0 after > > > btrfs_copy_from_user and going the correct delalloc accounting. = This > > > one looks solved, but you'll notice the patch is bigger. > > >=20 > > > First, I add some random failures to btrfs_copy_from_user() by fa= iling > > > everyone once and a while. This was much more reliable than tryi= ng to > > > use memory pressure than making copy_from_user fail. > > >=20 > > > If copy_from_user fails and we partially update a page, we end up= with a > > > page that may go away due to memory pressure. But, btrfs_file_wr= ite > > > assumes that only the first and last page may have good data that= needs > > > to be read off the disk. > > >=20 > > > This patch ditches that code and puts it into prepare_pages inste= ad. > > > But I'm still having some errors during long stress.sh runs. Ide= as are > > > more than welcome, hopefully some other timezones will kick in id= eas > > > while I sleep. > >=20 > > At least it doesn't fix the emerge-problem for me. The behavior is = now the same=20 > > as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt= ' with no=20 > > further interaction to get the emerge-process hang with a svn-proce= ss=20 > > consuming 100% CPU. I can cancel the emerge-process with ctrl-c but= the=20 > > spawned svn-process stays and it needs a reboot to get rid of it.=20 >=20 > Can you cat /proc/$pid/wchan a few times so we can get an idea of whe= re it's > looping? Thanks, >=20 > Josef It behaves the same way here with btrfs-unstable. The output of "cat /proc/$pid/wchan" is 0.=20 // Maria > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs= " in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >=20 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-02-28 16:45 ` Maria Wikström @ 2011-02-28 17:47 ` Mitch Harder 2011-02-28 20:20 ` Mitch Harder 0 siblings, 1 reply; 40+ messages in thread From: Mitch Harder @ 2011-02-28 17:47 UTC (permalink / raw) To: Maria Wikström Cc: Josef Bacik, Johannes Hirte, Chris Mason, Zhong, Xin, linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3729 bytes --] 2011/2/28 Maria Wikström <maria@ponstudios.se>: > mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: >> On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: >> > On Monday 28 February 2011 02:46:05 Chris Mason wrote: >> > > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: >> > > > Some clarification on my previous message... >> > > > >> > > > After looking at my ftrace log more closely, I can see where Btrfs is >> > > > trying to release the allocated pages. However, the calculation for >> > > > the number of dirty_pages is equal to 1 when "copied == 0". >> > > > >> > > > So I'm seeing at least two problems: >> > > > (1) It keeps looping when "copied == 0". >> > > > (2) One dirty page is not being released on every loop even though >> > > > "copied == 0" (at least this problem keeps it from being an infinite >> > > > loop by eventually exhausting reserveable space on the disk). >> > > >> > > Hi everyone, >> > > >> > > There are actually tow bugs here. First the one that Mitch hit, and a >> > > second one that still results in bad file_write results with my >> > > debugging hunks (the first two hunks below) in place. >> > > >> > > My patch fixes Mitch's bug by checking for copied == 0 after >> > > btrfs_copy_from_user and going the correct delalloc accounting. This >> > > one looks solved, but you'll notice the patch is bigger. >> > > >> > > First, I add some random failures to btrfs_copy_from_user() by failing >> > > everyone once and a while. This was much more reliable than trying to >> > > use memory pressure than making copy_from_user fail. >> > > >> > > If copy_from_user fails and we partially update a page, we end up with a >> > > page that may go away due to memory pressure. But, btrfs_file_write >> > > assumes that only the first and last page may have good data that needs >> > > to be read off the disk. >> > > >> > > This patch ditches that code and puts it into prepare_pages instead. >> > > But I'm still having some errors during long stress.sh runs. Ideas are >> > > more than welcome, hopefully some other timezones will kick in ideas >> > > while I sleep. >> > >> > At least it doesn't fix the emerge-problem for me. The behavior is now the same >> > as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no >> > further interaction to get the emerge-process hang with a svn-process >> > consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the >> > spawned svn-process stays and it needs a reboot to get rid of it. >> >> Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's >> looping? Thanks, >> >> Josef > > It behaves the same way here with btrfs-unstable. > The output of "cat /proc/$pid/wchan" is 0. > > // Maria > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > I've applied the patch at the head of this thread (with the jiffies debugging commented out) and I'm attaching a ftrace using the function_graph tracer when I'm stuck in the loop. I've just snipped out a couple of the loops (the full trace file is quite large, and mostly repititious). I'm going to try to modify file.c with some trace_printk debugging to show the values of several of the relevant variables at various stages. I'm going to try to exit the loop after 256 tries with an EFAULT so I can stop the tracing at that point and capture a trace of the entry into the problem (the ftrace ring buffer fills up too fast for me to capture the entry point). [-- Attachment #2: ftrace-btrfs-file-write-debugging.gz --] [-- Type: application/x-gzip, Size: 2590 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-02-28 17:47 ` Mitch Harder @ 2011-02-28 20:20 ` Mitch Harder 2011-03-01 5:09 ` Mitch Harder ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Mitch Harder @ 2011-02-28 20:20 UTC (permalink / raw) To: Maria Wikström Cc: Josef Bacik, Johannes Hirte, Chris Mason, Zhong, Xin, linux-btrfs@vger.kernel.org 2011/2/28 Mitch Harder <mitch.harder@sabayonlinux.org>: > 2011/2/28 Maria Wikstr=F6m <maria@ponstudios.se>: >> m=E5n 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: >>> On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: >>> > On Monday 28 February 2011 02:46:05 Chris Mason wrote: >>> > > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -05= 00: >>> > > > Some clarification on my previous message... >>> > > > >>> > > > After looking at my ftrace log more closely, I can see where = Btrfs is >>> > > > trying to release the allocated pages. =A0However, the calcul= ation for >>> > > > the number of dirty_pages is equal to 1 when "copied =3D=3D 0= ". >>> > > > >>> > > > So I'm seeing at least two problems: >>> > > > (1) =A0It keeps looping when "copied =3D=3D 0". >>> > > > (2) =A0One dirty page is not being released on every loop eve= n though >>> > > > "copied =3D=3D 0" (at least this problem keeps it from being = an infinite >>> > > > loop by eventually exhausting reserveable space on the disk). >>> > > >>> > > Hi everyone, >>> > > >>> > > There are actually tow bugs here. =A0First the one that Mitch h= it, and a >>> > > second one that still results in bad file_write results with my >>> > > debugging hunks (the first two hunks below) in place. >>> > > >>> > > My patch fixes Mitch's bug by checking for copied =3D=3D 0 afte= r >>> > > btrfs_copy_from_user and going the correct delalloc accounting.= =A0This >>> > > one looks solved, but you'll notice the patch is bigger. >>> > > >>> > > First, I add some random failures to btrfs_copy_from_user() by = failing >>> > > everyone once and a while. =A0This was much more reliable than = trying to >>> > > use memory pressure than making copy_from_user fail. >>> > > >>> > > If copy_from_user fails and we partially update a page, we end = up with a >>> > > page that may go away due to memory pressure. =A0But, btrfs_fil= e_write >>> > > assumes that only the first and last page may have good data th= at needs >>> > > to be read off the disk. >>> > > >>> > > This patch ditches that code and puts it into prepare_pages ins= tead. >>> > > But I'm still having some errors during long stress.sh runs. =A0= Ideas are >>> > > more than welcome, hopefully some other timezones will kick in = ideas >>> > > while I sleep. >>> > >>> > At least it doesn't fix the emerge-problem for me. The behavior i= s now the same >>> > as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcry= pt' with no >>> > further interaction to get the emerge-process hang with a svn-pro= cess >>> > consuming 100% CPU. I can cancel the emerge-process with ctrl-c b= ut the >>> > spawned svn-process stays and it needs a reboot to get rid of it. >>> >>> Can you cat /proc/$pid/wchan a few times so we can get an idea of w= here it's >>> looping? =A0Thanks, >>> >>> Josef >> >> It behaves the same way here with btrfs-unstable. >> The output of "cat /proc/$pid/wchan" is 0. >> >> // Maria >> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btr= fs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm= l >>> >> >> >> > > I've applied the patch at the head of this thread (with the jiffies > debugging commented out) and I'm attaching a ftrace using the > function_graph tracer when I'm stuck in the loop. =A0I've just snippe= d > out a couple of the loops (the full trace file is quite large, and > mostly repititious). > > I'm going to try to modify file.c with some trace_printk debugging to > show the values of several of the relevant variables at various > stages. > > I'm going to try to exit the loop after 256 tries with an EFAULT so I > can stop the tracing at that point and capture a trace of the entry > into the problem (the ftrace ring buffer fills up too fast for me to > capture the entry point). > As promised, I'm put together a modified file.c with many trace_printk debugging statements to augment the ftrace. The trace is ~128K compressed (about 31,600 lines or 2.6MB uncompressed), so I'm putting it up on my local server instead of attaching. Let me know if it would be more appropriate to send to the list as an attachment. http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz I preface all my trace_printk comments with "TPK:" to make skipping through the trace easier. The trace contains the trace of about 3 or 4 successful passes through the btrfs_file_aio_write() function to show what a successful trace looks like. The pass through the btrfs_file_aio_write() that breaks begins on line = 1088. I let it loop through the while (iov_iter_count(&i) > 0) {} loop for 256 times when copied=3D=3D0 (otherwise it would loop infinitely). The= n exit out and stop the trace. =46or reference, here's a diff file for the debugging statements I've added to file.c: Let me know if you would like me to re-run this trial with different debugging lines. fs/btrfs/file.c /usr/src/linux/fs/btrfs/file.c --- fs/btrfs/file.c 2011-02-28 10:13:45.000000000 -0600 +++ /usr/src/linux/fs/btrfs/file.c 2011-02-28 13:07:11.000000000 -0600 @@ -53,12 +53,14 @@ int offset =3D pos & (PAGE_CACHE_SIZE - 1); int total_copied =3D 0; + /*************************** if ((jiffies % 10) =3D=3D 0) return 0; if ((jiffies % 25) =3D=3D 0) { write_bytes /=3D 2; } + **************************/ while (write_bytes > 0) { size_t count =3D min_t(size_t, @@ -82,10 +84,13 @@ /* Return to btrfs_file_aio_write to fault page */ if (unlikely(copied =3D=3D 0)) { + trace_printk("TPK: unlikely copied =3D=3D 0 in btrfs_copy_from_user (total_copied=3D%i)\n", + total_copied); break; } if (unlikely(copied < PAGE_CACHE_SIZE - offset)) { + trace_printk("TPK: unlikely copied < PAGE_CACHE_SIZE - offset in btrfs_copy_from_user\n"); offset +=3D copied; } else { pg++; @@ -910,8 +915,13 @@ int will_write; int buffered =3D 0; int copied =3D 0; + int copied_problem =3D 0; + int copied_loop_count =3D 0; + int stop_ftrace =3D 0; int dirty_pages =3D 0; + trace_printk("TPK: Entering btrfs_file_aio_write()\n"); + will_write =3D ((file->f_flags & O_DSYNC) || IS_SYNC(inode) || (file->f_flags & O_DIRECT)); @@ -953,6 +963,7 @@ BTRFS_I(inode)->sequence++; if (unlikely(file->f_flags & O_DIRECT)) { + trace_printk("TPK: transferred to unlikely(file->f_flags \& O_DIRECT= )\n"); num_written =3D generic_file_direct_write(iocb, iov, &nr_segs, pos, ppos, count, ocount); @@ -984,6 +995,8 @@ */ buffered =3D 1; pos +=3D num_written; + trace_printk("TPK: end unlikely(file->f_flags \& O_DIRECT) with num_written=3D%i\n", + num_written); } iov_iter_init(&i, iov, nr_segs, count, num_written); @@ -1003,6 +1016,8 @@ last_index =3D (pos + iov_iter_count(&i)) >> PAGE_CACHE_SHIFT; while (iov_iter_count(&i) > 0) { + trace_printk("TPK: start section while (iov_iter_count() > 0)\n"); + size_t offset =3D pos & (PAGE_CACHE_SIZE - 1); size_t write_bytes =3D min(iov_iter_count(&i), nrptrs * (size_t)PAGE_CACHE_SIZE - @@ -1010,6 +1025,9 @@ size_t num_pages =3D (write_bytes + offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; + trace_printk("TPK: iov_iter_count()=3D%i || nr_segs=3D%lu || nrptrs=3D= %i || offset=3D%lu || write_bytes=3D%lu || num_pages=3D%lu\n", + iov_iter_count(&i), nr_segs, nrptrs, offset, write_bytes, num_pa= ges); + WARN_ON(num_pages > nrptrs); memset(pages, 0, sizeof(struct page *) * nrptrs); @@ -1022,6 +1040,19 @@ goto out; } + if (unlikely(copied_problem)) { + copied_loop_count++; + trace_printk("TPK: copied problem(%i)\n", + copied_loop_count); + /* Give up if we've already tried 256 times */ + if (copied_loop_count > 256) { + ret =3D -EFAULT; + stop_ftrace =3D 1; + trace_printk("TPK: copied loop count exceeded, returning EFAULT...= =2E\n"); + goto out; + } + } + ret =3D btrfs_delalloc_reserve_space(inode, num_pages << PAGE_CACHE_SHIFT); if (ret) @@ -1045,6 +1076,10 @@ PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; + if (copied =3D=3D 0) { + copied_problem =3D 1; + } + if (num_pages > dirty_pages) { if (copied > 0) atomic_inc( @@ -1080,11 +1115,19 @@ num_written +=3D copied; cond_resched(); + trace_printk("TPK: end section while (iov_iter_count() > 0)\n"); + trace_printk(" copied=3D%i || dirty_pages=3D%i || num_written=3D%i |= | ending iov_iter_count=3D%i\n", + copied, dirty_pages, num_written, iov_iter_count(&i) ); } out: + trace_printk("TPK: Reached: out:\n"); + mutex_unlock(&inode->i_mutex); - if (ret) + if (ret) { err =3D ret; + trace_printk("TPK: ret,err had value %i when out: was reached (num_written: %i)\n", + ret, num_written); + } kfree(pages); *ppos =3D pos; @@ -1140,6 +1183,19 @@ } done: current->backing_dev_info =3D NULL; + if (ret) { + trace_printk("TPK: btrfs_file_aio_write exiting with non-zero ret value (%i)\n", ret); + trace_printk("TPK: Returning num_written (%i) ? num_written (%i) : err (%i) =3D %i\n", + num_written, num_written, err, num_written ? num_written : err); + } else { + trace_printk("TPK: btrfs_file_aio_write exiting normally with (%i)", + num_written ? num_written : err); + } +=09 + if (unlikely(stop_ftrace)) { + tracing_off(); + } + return num_written ? num_written : err; } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-02-28 20:20 ` Mitch Harder @ 2011-03-01 5:09 ` Mitch Harder 2011-03-01 10:14 ` Zhong, Xin 2011-03-01 21:56 ` Piotr Szymaniak 2 siblings, 0 replies; 40+ messages in thread From: Mitch Harder @ 2011-03-01 5:09 UTC (permalink / raw) To: Maria Wikström Cc: Josef Bacik, Johannes Hirte, Chris Mason, Zhong, Xin, linux-btrfs@vger.kernel.org 2011/2/28 Mitch Harder <mitch.harder@sabayonlinux.org>: > 2011/2/28 Mitch Harder <mitch.harder@sabayonlinux.org>: >> 2011/2/28 Maria Wikstr=F6m <maria@ponstudios.se>: >>> m=E5n 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: >>>> On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: >>>> > On Monday 28 February 2011 02:46:05 Chris Mason wrote: >>>> > > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0= 500: >>>> > > > Some clarification on my previous message... >>>> > > > >>>> > > > After looking at my ftrace log more closely, I can see where= Btrfs is >>>> > > > trying to release the allocated pages. =A0However, the calcu= lation for >>>> > > > the number of dirty_pages is equal to 1 when "copied =3D=3D = 0". >>>> > > > >>>> > > > So I'm seeing at least two problems: >>>> > > > (1) =A0It keeps looping when "copied =3D=3D 0". >>>> > > > (2) =A0One dirty page is not being released on every loop ev= en though >>>> > > > "copied =3D=3D 0" (at least this problem keeps it from being= an infinite >>>> > > > loop by eventually exhausting reserveable space on the disk)= =2E >>>> > > >>>> > > Hi everyone, >>>> > > >>>> > > There are actually tow bugs here. =A0First the one that Mitch = hit, and a >>>> > > second one that still results in bad file_write results with m= y >>>> > > debugging hunks (the first two hunks below) in place. >>>> > > >>>> > > My patch fixes Mitch's bug by checking for copied =3D=3D 0 aft= er >>>> > > btrfs_copy_from_user and going the correct delalloc accounting= =2E =A0This >>>> > > one looks solved, but you'll notice the patch is bigger. >>>> > > >>>> > > First, I add some random failures to btrfs_copy_from_user() by= failing >>>> > > everyone once and a while. =A0This was much more reliable than= trying to >>>> > > use memory pressure than making copy_from_user fail. >>>> > > >>>> > > If copy_from_user fails and we partially update a page, we end= up with a >>>> > > page that may go away due to memory pressure. =A0But, btrfs_fi= le_write >>>> > > assumes that only the first and last page may have good data t= hat needs >>>> > > to be read off the disk. >>>> > > >>>> > > This patch ditches that code and puts it into prepare_pages in= stead. >>>> > > But I'm still having some errors during long stress.sh runs. =A0= Ideas are >>>> > > more than welcome, hopefully some other timezones will kick in= ideas >>>> > > while I sleep. >>>> > >>>> > At least it doesn't fix the emerge-problem for me. The behavior = is now the same >>>> > as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcr= ypt' with no >>>> > further interaction to get the emerge-process hang with a svn-pr= ocess >>>> > consuming 100% CPU. I can cancel the emerge-process with ctrl-c = but the >>>> > spawned svn-process stays and it needs a reboot to get rid of it= =2E >>>> >>>> Can you cat /proc/$pid/wchan a few times so we can get an idea of = where it's >>>> looping? =A0Thanks, >>>> >>>> Josef >>> >>> It behaves the same way here with btrfs-unstable. >>> The output of "cat /proc/$pid/wchan" is 0. >>> >>> // Maria >>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-bt= rfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht= ml >>>> >>> >>> >>> >> >> I've applied the patch at the head of this thread (with the jiffies >> debugging commented out) and I'm attaching a ftrace using the >> function_graph tracer when I'm stuck in the loop. =A0I've just snipp= ed >> out a couple of the loops (the full trace file is quite large, and >> mostly repititious). >> >> I'm going to try to modify file.c with some trace_printk debugging t= o >> show the values of several of the relevant variables at various >> stages. >> >> I'm going to try to exit the loop after 256 tries with an EFAULT so = I >> can stop the tracing at that point and capture a trace of the entry >> into the problem (the ftrace ring buffer fills up too fast for me to >> capture the entry point). >> > > As promised, I'm put together a modified file.c with many trace_print= k > debugging statements to augment the ftrace. > > The trace is ~128K compressed (about 31,600 lines or 2.6MB > uncompressed), so I'm putting it up on my local server instead of > attaching. =A0Let me know if it would be more appropriate to send to = the > list as an attachment. > > http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz > > I preface all my trace_printk comments with "TPK:" to make skipping > through the trace easier. > > The trace contains the trace of about 3 or 4 successful passes throug= h > the btrfs_file_aio_write() function to show what a successful trace > looks like. > > The pass through the btrfs_file_aio_write() that breaks begins on lin= e 1088. > > I let it loop through the while (iov_iter_count(&i) > 0) {} loop for > 256 times when copied=3D=3D0 (otherwise it would loop infinitely). =A0= Then > exit out and stop the trace. > > For reference, here's a diff file for the debugging statements I've > added to file.c: > > Let me know if you would like me to re-run this trial with different > debugging lines. > > =A0fs/btrfs/file.c /usr/src/linux/fs/btrfs/file.c > --- fs/btrfs/file.c =A0 =A0 2011-02-28 10:13:45.000000000 -0600 > +++ /usr/src/linux/fs/btrfs/file.c =A0 =A0 =A02011-02-28 13:07:11.000= 000000 -0600 > @@ -53,12 +53,14 @@ > =A0 =A0 =A0 =A0int offset =3D pos & (PAGE_CACHE_SIZE - 1); > =A0 =A0 =A0 =A0int total_copied =3D 0; > > + =A0 =A0 =A0 /*************************** > =A0 =A0 =A0 =A0if ((jiffies % 10) =3D=3D 0) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return 0; > > =A0 =A0 =A0 =A0if ((jiffies % 25) =3D=3D 0) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0write_bytes /=3D 2; > =A0 =A0 =A0 =A0} > + =A0 =A0 =A0 **************************/ > > =A0 =A0 =A0 =A0while (write_bytes > 0) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0size_t count =3D min_t(size_t, > @@ -82,10 +84,13 @@ > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* Return to btrfs_file_aio_write to f= ault page */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (unlikely(copied =3D=3D 0)) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: unli= kely copied =3D=3D 0 in btrfs_copy_from_user > (total_copied=3D%i)\n", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0total_copied); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (unlikely(copied < PAGE_CACHE_SIZE = - offset)) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: unli= kely copied < PAGE_CACHE_SIZE - offset in > btrfs_copy_from_user\n"); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0offset +=3D copied; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0pg++; > @@ -910,8 +915,13 @@ > =A0 =A0 =A0 =A0int will_write; > =A0 =A0 =A0 =A0int buffered =3D 0; > =A0 =A0 =A0 =A0int copied =3D 0; > + =A0 =A0 =A0 int copied_problem =3D 0; > + =A0 =A0 =A0 int copied_loop_count =3D 0; > + =A0 =A0 =A0 int stop_ftrace =3D 0; > =A0 =A0 =A0 =A0int dirty_pages =3D 0; > > + =A0 =A0 =A0 trace_printk("TPK: Entering btrfs_file_aio_write()\n"); > + > =A0 =A0 =A0 =A0will_write =3D ((file->f_flags & O_DSYNC) || IS_SYNC(i= node) || > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(file->f_flags & O_DIRECT)= ); > > @@ -953,6 +963,7 @@ > =A0 =A0 =A0 =A0BTRFS_I(inode)->sequence++; > > =A0 =A0 =A0 =A0if (unlikely(file->f_flags & O_DIRECT)) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: transferred to unlik= ely(file->f_flags \& O_DIRECT)\n"); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0num_written =3D generic_file_direct_wr= ite(iocb, iov, &nr_segs, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0pos, ppos, count, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ocount); > @@ -984,6 +995,8 @@ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0buffered =3D 1; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0pos +=3D num_written; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: end unlikely(file->f= _flags \& O_DIRECT) with > num_written=3D%i\n", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0num_written)= ; > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0iov_iter_init(&i, iov, nr_segs, count, num_written); > @@ -1003,6 +1016,8 @@ > =A0 =A0 =A0 =A0last_index =3D (pos + iov_iter_count(&i)) >> PAGE_CACH= E_SHIFT; > > =A0 =A0 =A0 =A0while (iov_iter_count(&i) > 0) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: start section while = (iov_iter_count() > 0)\n"); > + > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0size_t offset =3D pos & (PAGE_CACHE_SI= ZE - 1); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0size_t write_bytes =3D min(iov_iter_co= unt(&i), > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 nrptrs * (size_t)PAGE_CACHE_SIZE - > @@ -1010,6 +1025,9 @@ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0size_t num_pages =3D (write_bytes + of= fset + > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: iov_iter_count()=3D%= i || nr_segs=3D%lu || nrptrs=3D%i > || offset=3D%lu || write_bytes=3D%lu || num_pages=3D%lu\n", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 iov_iter_count(&i), nr_segs, nrptrs, offset, w= rite_bytes, num_pages); > + > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0WARN_ON(num_pages > nrptrs); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memset(pages, 0, sizeof(struct page *)= * nrptrs); > > @@ -1022,6 +1040,19 @@ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto out; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (unlikely(copied_problem)) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 copied_loop_count++; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: copi= ed problem(%i)\n", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0copied_loop_count); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Give up if we've alr= eady tried 256 times */ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (copied_loop_count >= 256) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D= -EFAULT; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 stop_ft= race =3D 1; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_p= rintk("TPK: copied loop count exceeded, returning EFAULT....\n"); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto ou= t; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D btrfs_delalloc_reserve_space(i= node, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0num_pages << PAGE_CACHE_SHIFT); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (ret) > @@ -1045,6 +1076,10 @@ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 PAGE_CACHE_SIZE - 1) >> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 PAGE_CACHE_SHIFT; > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (copied =3D=3D 0) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 copied_problem =3D 1; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (num_pages > dirty_pages) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (copied > 0) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0atomic= _inc( > @@ -1080,11 +1115,19 @@ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0num_written +=3D copied; > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cond_resched(); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: end section while (i= ov_iter_count() > 0)\n"); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk(" copied=3D%i || dirty_pag= es=3D%i || num_written=3D%i || > ending iov_iter_count=3D%i\n", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 copied, dirty_pages, num_written, iov_iter_cou= nt(&i) ); > =A0 =A0 =A0 =A0} > =A0out: > + =A0 =A0 =A0 trace_printk("TPK: Reached: out:\n"); > + > =A0 =A0 =A0 =A0mutex_unlock(&inode->i_mutex); > - =A0 =A0 =A0 if (ret) > + =A0 =A0 =A0 if (ret) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0err =3D ret; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: ret,err had value %i= when out: was reached > (num_written: %i)\n", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret, num_wri= tten); > + =A0 =A0 =A0 } > > =A0 =A0 =A0 =A0kfree(pages); > =A0 =A0 =A0 =A0*ppos =3D pos; > @@ -1140,6 +1183,19 @@ > =A0 =A0 =A0 =A0} > =A0done: > =A0 =A0 =A0 =A0current->backing_dev_info =3D NULL; > + =A0 =A0 =A0 if (ret) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: btrfs_file_aio_write= exiting with non-zero ret > value (%i)\n", ret); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: Returning num_writte= n (%i) ? num_written (%i) : > err (%i) =3D %i\n", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 num_wri= tten, num_written, err, num_written ? num_written : err); > + =A0 =A0 =A0 } else { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 trace_printk("TPK: btrfs_file_aio_write= exiting normally with (%i)", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0num_written = ? num_written : err); > + =A0 =A0 =A0 } > + > + =A0 =A0 =A0 if (unlikely(stop_ftrace)) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 tracing_off(); > + =A0 =A0 =A0 } > + > =A0 =A0 =A0 =A0return num_written ? num_written : err; > =A0} > I'm developing a hypothesis that something is going wrong when Btrfs is trying to lock multiple pages. Page faults are being encountered at the same spot, over and over (BTW, I ran a memcheck to rule that possibility out). If I scan through my traces, it looks like most calls to write are submitted 1 block (4096 bytes) at a time, at the most (also many are < 4096 bytes). The portion that is causing a problem is trying to write 12288 bytes (3k). For some reason, the first 24 bytes are written, and the remainder of the loop is spent on the 12264 that are remaining. But page faults are encountered on each loop, and no more bytes are copied. I modified file.c to cut back on the scope of the attempted write to smaller chunks. The following patch "fixes" my problem, and this build completes without error. I'm not submitting this patch as a solution. It's clearly papering over a deeper issue. However, I think it gives insight into the problem. I wrote the patch to allow for sequentially smaller blocks if failures continue. But the block cleared once I limited the scope to 4096 bytes. It never needed to try smaller data segments. As hoped, limiting the scope of the write allowed the 12264 bytes to be written in the next three subsequent loops after lowering the scope of the write. It was interesting to note that cutting the scope to 4096 didn't guarantee that you were limiting the write to one block. There was usually some overlap, and 2 dirty block needed to be written. But I'm still suspicious that there is a mismatch somewhere when trying to lock multiple blocks. Here's the debugging patch I constructed which actually allowed the build to complete without error. This was applied (for testing purposes) on top of the previous debugging patch. As noted earlier, it never went lower than a 4096 byte window for write_bytes. --- file.c.file-write-patch-v1 2011-02-28 13:06:37.000000000 -0600 +++ file.c 2011-02-28 19:23:21.000000000 -0600 @@ -908,6 +908,7 @@ ssize_t err =3D 0; size_t count; size_t ocount; + size_t write_bytes =3D 0; int ret =3D 0; int nrptrs; unsigned long first_index; @@ -963,7 +964,7 @@ BTRFS_I(inode)->sequence++; if (unlikely(file->f_flags & O_DIRECT)) { - trace_printk("TPK: transferred to unlikely(file->f_flags \& O_DIRECT= )\n"); + trace_printk("TPK: transferred to unlikely(file->f_flags & O_DIRECT)= \n"); num_written =3D generic_file_direct_write(iocb, iov, &nr_segs, pos, ppos, count, ocount); @@ -995,7 +996,7 @@ */ buffered =3D 1; pos +=3D num_written; - trace_printk("TPK: end unlikely(file->f_flags \& O_DIRECT) with num_written=3D%i\n", + trace_printk("TPK: end unlikely(file->f_flags & O_DIRECT) with num_written=3D%i\n", num_written); } @@ -1019,9 +1020,20 @@ trace_printk("TPK: start section while (iov_iter_count() > 0)\n"); size_t offset =3D pos & (PAGE_CACHE_SIZE - 1); - size_t write_bytes =3D min(iov_iter_count(&i), - nrptrs * (size_t)PAGE_CACHE_SIZE - - offset); + write_bytes =3D min(iov_iter_count(&i), + nrptrs * (size_t)PAGE_CACHE_SIZE - + offset); + if (unlikely(copied_problem)) { + if (copied_loop_count > 128) { + write_bytes =3D min(write_bytes, 4096); + } else if (copied_loop_count > 160) { + write_bytes =3D min(write_bytes, 2048); + } else if (copied_loop_count > 192) { + write_bytes =3D min(write_bytes, 1024); + } else if (copied_loop_count > 224) { + write_bytes =3D min(write_bytes, 512); + } + } size_t num_pages =3D (write_bytes + offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-02-28 20:20 ` Mitch Harder 2011-03-01 5:09 ` Mitch Harder @ 2011-03-01 10:14 ` Zhong, Xin 2011-03-01 11:56 ` Zhong, Xin 2011-03-01 14:51 ` Mitch Harder 2011-03-01 21:56 ` Piotr Szymaniak 2 siblings, 2 replies; 40+ messages in thread From: Zhong, Xin @ 2011-03-01 10:14 UTC (permalink / raw) To: Mitch Harder, Maria Wikström Cc: Josef Bacik, Johannes Hirte, Chris Mason, linux-btrfs@vger.kernel.org Is your system running out of memory or is there any other thread like = flush-btrfs competing for the same page? I can only see one process in your ftrace log. You may need to trace al= l btrfs.ko function calls instead of a single process. Thanks! -----Original Message----- =46rom: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vge= r.kernel.org] On Behalf Of Mitch Harder Sent: Tuesday, March 01, 2011 4:20 AM To: Maria Wikstr=F6m Cc: Josef Bacik; Johannes Hirte; Chris Mason; Zhong, Xin; linux-btrfs@v= ger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch 2011/2/28 Mitch Harder <mitch.harder@sabayonlinux.org>: > 2011/2/28 Maria Wikstr=F6m <maria@ponstudios.se>: >> m=E5n 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: >>> On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: >>> > On Monday 28 February 2011 02:46:05 Chris Mason wrote: >>> > > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -05= 00: >>> > > > Some clarification on my previous message... >>> > > > >>> > > > After looking at my ftrace log more closely, I can see where = Btrfs is >>> > > > trying to release the allocated pages. =A0However, the calcul= ation for >>> > > > the number of dirty_pages is equal to 1 when "copied =3D=3D 0= ". >>> > > > >>> > > > So I'm seeing at least two problems: >>> > > > (1) =A0It keeps looping when "copied =3D=3D 0". >>> > > > (2) =A0One dirty page is not being released on every loop eve= n though >>> > > > "copied =3D=3D 0" (at least this problem keeps it from being = an infinite >>> > > > loop by eventually exhausting reserveable space on the disk). >>> > > >>> > > Hi everyone, >>> > > >>> > > There are actually tow bugs here. =A0First the one that Mitch h= it, and a >>> > > second one that still results in bad file_write results with my >>> > > debugging hunks (the first two hunks below) in place. >>> > > >>> > > My patch fixes Mitch's bug by checking for copied =3D=3D 0 afte= r >>> > > btrfs_copy_from_user and going the correct delalloc accounting.= =A0This >>> > > one looks solved, but you'll notice the patch is bigger. >>> > > >>> > > First, I add some random failures to btrfs_copy_from_user() by = failing >>> > > everyone once and a while. =A0This was much more reliable than = trying to >>> > > use memory pressure than making copy_from_user fail. >>> > > >>> > > If copy_from_user fails and we partially update a page, we end = up with a >>> > > page that may go away due to memory pressure. =A0But, btrfs_fil= e_write >>> > > assumes that only the first and last page may have good data th= at needs >>> > > to be read off the disk. >>> > > >>> > > This patch ditches that code and puts it into prepare_pages ins= tead. >>> > > But I'm still having some errors during long stress.sh runs. =A0= Ideas are >>> > > more than welcome, hopefully some other timezones will kick in = ideas >>> > > while I sleep. >>> > >>> > At least it doesn't fix the emerge-problem for me. The behavior i= s now the same >>> > as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcry= pt' with no >>> > further interaction to get the emerge-process hang with a svn-pro= cess >>> > consuming 100% CPU. I can cancel the emerge-process with ctrl-c b= ut the >>> > spawned svn-process stays and it needs a reboot to get rid of it. >>> >>> Can you cat /proc/$pid/wchan a few times so we can get an idea of w= here it's >>> looping? =A0Thanks, >>> >>> Josef >> >> It behaves the same way here with btrfs-unstable. >> The output of "cat /proc/$pid/wchan" is 0. >> >> // Maria >> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btr= fs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm= l >>> >> >> >> > > I've applied the patch at the head of this thread (with the jiffies > debugging commented out) and I'm attaching a ftrace using the > function_graph tracer when I'm stuck in the loop. =A0I've just snippe= d > out a couple of the loops (the full trace file is quite large, and > mostly repititious). > > I'm going to try to modify file.c with some trace_printk debugging to > show the values of several of the relevant variables at various > stages. > > I'm going to try to exit the loop after 256 tries with an EFAULT so I > can stop the tracing at that point and capture a trace of the entry > into the problem (the ftrace ring buffer fills up too fast for me to > capture the entry point). > As promised, I'm put together a modified file.c with many trace_printk debugging statements to augment the ftrace. The trace is ~128K compressed (about 31,600 lines or 2.6MB uncompressed), so I'm putting it up on my local server instead of attaching. Let me know if it would be more appropriate to send to the list as an attachment. http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz I preface all my trace_printk comments with "TPK:" to make skipping through the trace easier. The trace contains the trace of about 3 or 4 successful passes through the btrfs_file_aio_write() function to show what a successful trace looks like. The pass through the btrfs_file_aio_write() that breaks begins on line = 1088. I let it loop through the while (iov_iter_count(&i) > 0) {} loop for 256 times when copied=3D=3D0 (otherwise it would loop infinitely). The= n exit out and stop the trace. =46or reference, here's a diff file for the debugging statements I've added to file.c: Let me know if you would like me to re-run this trial with different debugging lines. fs/btrfs/file.c /usr/src/linux/fs/btrfs/file.c --- fs/btrfs/file.c 2011-02-28 10:13:45.000000000 -0600 +++ /usr/src/linux/fs/btrfs/file.c 2011-02-28 13:07:11.000000000 -0600 @@ -53,12 +53,14 @@ int offset =3D pos & (PAGE_CACHE_SIZE - 1); int total_copied =3D 0; + /*************************** if ((jiffies % 10) =3D=3D 0) return 0; if ((jiffies % 25) =3D=3D 0) { write_bytes /=3D 2; } + **************************/ while (write_bytes > 0) { size_t count =3D min_t(size_t, @@ -82,10 +84,13 @@ /* Return to btrfs_file_aio_write to fault page */ if (unlikely(copied =3D=3D 0)) { + trace_printk("TPK: unlikely copied =3D=3D 0 in btrfs_copy_from_user (total_copied=3D%i)\n", + total_copied); break; } if (unlikely(copied < PAGE_CACHE_SIZE - offset)) { + trace_printk("TPK: unlikely copied < PAGE_CACHE_SIZE - offset in btrfs_copy_from_user\n"); offset +=3D copied; } else { pg++; @@ -910,8 +915,13 @@ int will_write; int buffered =3D 0; int copied =3D 0; + int copied_problem =3D 0; + int copied_loop_count =3D 0; + int stop_ftrace =3D 0; int dirty_pages =3D 0; + trace_printk("TPK: Entering btrfs_file_aio_write()\n"); + will_write =3D ((file->f_flags & O_DSYNC) || IS_SYNC(inode) || (file->f_flags & O_DIRECT)); @@ -953,6 +963,7 @@ BTRFS_I(inode)->sequence++; if (unlikely(file->f_flags & O_DIRECT)) { + trace_printk("TPK: transferred to unlikely(file->f_flags \& O_DIRECT= )\n"); num_written =3D generic_file_direct_write(iocb, iov, &nr_segs, pos, ppos, count, ocount); @@ -984,6 +995,8 @@ */ buffered =3D 1; pos +=3D num_written; + trace_printk("TPK: end unlikely(file->f_flags \& O_DIRECT) with num_written=3D%i\n", + num_written); } iov_iter_init(&i, iov, nr_segs, count, num_written); @@ -1003,6 +1016,8 @@ last_index =3D (pos + iov_iter_count(&i)) >> PAGE_CACHE_SHIFT; while (iov_iter_count(&i) > 0) { + trace_printk("TPK: start section while (iov_iter_count() > 0)\n"); + size_t offset =3D pos & (PAGE_CACHE_SIZE - 1); size_t write_bytes =3D min(iov_iter_count(&i), nrptrs * (size_t)PAGE_CACHE_SIZE - @@ -1010,6 +1025,9 @@ size_t num_pages =3D (write_bytes + offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; + trace_printk("TPK: iov_iter_count()=3D%i || nr_segs=3D%lu || nrptrs=3D= %i || offset=3D%lu || write_bytes=3D%lu || num_pages=3D%lu\n", + iov_iter_count(&i), nr_segs, nrptrs, offset, write_bytes, num_pa= ges); + WARN_ON(num_pages > nrptrs); memset(pages, 0, sizeof(struct page *) * nrptrs); @@ -1022,6 +1040,19 @@ goto out; } + if (unlikely(copied_problem)) { + copied_loop_count++; + trace_printk("TPK: copied problem(%i)\n", + copied_loop_count); + /* Give up if we've already tried 256 times */ + if (copied_loop_count > 256) { + ret =3D -EFAULT; + stop_ftrace =3D 1; + trace_printk("TPK: copied loop count exceeded, returning EFAULT...= =2E\n"); + goto out; + } + } + ret =3D btrfs_delalloc_reserve_space(inode, num_pages << PAGE_CACHE_SHIFT); if (ret) @@ -1045,6 +1076,10 @@ PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; + if (copied =3D=3D 0) { + copied_problem =3D 1; + } + if (num_pages > dirty_pages) { if (copied > 0) atomic_inc( @@ -1080,11 +1115,19 @@ num_written +=3D copied; cond_resched(); + trace_printk("TPK: end section while (iov_iter_count() > 0)\n"); + trace_printk(" copied=3D%i || dirty_pages=3D%i || num_written=3D%i |= | ending iov_iter_count=3D%i\n", + copied, dirty_pages, num_written, iov_iter_count(&i) ); } out: + trace_printk("TPK: Reached: out:\n"); + mutex_unlock(&inode->i_mutex); - if (ret) + if (ret) { err =3D ret; + trace_printk("TPK: ret,err had value %i when out: was reached (num_written: %i)\n", + ret, num_written); + } kfree(pages); *ppos =3D pos; @@ -1140,6 +1183,19 @@ } done: current->backing_dev_info =3D NULL; + if (ret) { + trace_printk("TPK: btrfs_file_aio_write exiting with non-zero ret value (%i)\n", ret); + trace_printk("TPK: Returning num_written (%i) ? num_written (%i) : err (%i) =3D %i\n", + num_written, num_written, err, num_written ? num_written : err); + } else { + trace_printk("TPK: btrfs_file_aio_write exiting normally with (%i)", + num_written ? num_written : err); + } +=09 + if (unlikely(stop_ftrace)) { + tracing_off(); + } + return num_written ? num_written : err; } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] btrfs file write debugging patch 2011-03-01 10:14 ` Zhong, Xin @ 2011-03-01 11:56 ` Zhong, Xin 2011-03-01 14:54 ` Mitch Harder 2011-03-01 14:51 ` Mitch Harder 1 sibling, 1 reply; 40+ messages in thread From: Zhong, Xin @ 2011-03-01 11:56 UTC (permalink / raw) To: Zhong, Xin, Mitch Harder, Maria Wikström Cc: Josef Bacik, Johannes Hirte, Chris Mason, linux-btrfs@vger.kernel.org Hi Mitch, I suspect there's a lock contention between flush-btrfs (lock_dellalloc= _pages) and btrfs_file_aio_write. However I can not recreate it locally= =2E Could you please try below patch? Thanks! diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 65338a1..b9d0929 1= 00644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1007,17 +1007,16 @@ static ssize_t btrfs_file_aio_write(struct kioc= b *iocb, goto out; } =20 - ret =3D btrfs_delalloc_reserve_space(inode, - num_pages << PAGE_CACHE_SHIFT); - if (ret) - goto out; - ret =3D prepare_pages(root, file, pages, num_pages, pos, first_index, last_index, write_bytes); - if (ret) { - btrfs_delalloc_release_space(inode, + if (ret) + goto out; + =09 + ret =3D btrfs_delalloc_reserve_space(inode, num_pages << PAGE_CACHE_SHIFT); + if (ret) { + btrfs_drop_pages(pages, num_pages); goto out; } -----Original Message----- =46rom: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vge= r.kernel.org] On Behalf Of Zhong, Xin Sent: Tuesday, March 01, 2011 6:15 PM To: Mitch Harder; Maria Wikstr=F6m Cc: Josef Bacik; Johannes Hirte; Chris Mason; linux-btrfs@vger.kernel.o= rg Subject: RE: [PATCH] btrfs file write debugging patch Is your system running out of memory or is there any other thread like = flush-btrfs competing for the same page? I can only see one process in your ftrace log. You may need to trace al= l btrfs.ko function calls instead of a single process. Thanks! -----Original Message----- =46rom: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vge= r.kernel.org] On Behalf Of Mitch Harder Sent: Tuesday, March 01, 2011 4:20 AM To: Maria Wikstr=F6m Cc: Josef Bacik; Johannes Hirte; Chris Mason; Zhong, Xin; linux-btrfs@v= ger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch 2011/2/28 Mitch Harder <mitch.harder@sabayonlinux.org>: > 2011/2/28 Maria Wikstr=F6m <maria@ponstudios.se>: >> m=E5n 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: >>> On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: >>> > On Monday 28 February 2011 02:46:05 Chris Mason wrote: >>> > > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -05= 00: >>> > > > Some clarification on my previous message... >>> > > > >>> > > > After looking at my ftrace log more closely, I can see where = Btrfs is >>> > > > trying to release the allocated pages. =A0However, the calcul= ation for >>> > > > the number of dirty_pages is equal to 1 when "copied =3D=3D 0= ". >>> > > > >>> > > > So I'm seeing at least two problems: >>> > > > (1) =A0It keeps looping when "copied =3D=3D 0". >>> > > > (2) =A0One dirty page is not being released on every loop eve= n though >>> > > > "copied =3D=3D 0" (at least this problem keeps it from being = an infinite >>> > > > loop by eventually exhausting reserveable space on the disk). >>> > > >>> > > Hi everyone, >>> > > >>> > > There are actually tow bugs here. =A0First the one that Mitch h= it, and a >>> > > second one that still results in bad file_write results with my >>> > > debugging hunks (the first two hunks below) in place. >>> > > >>> > > My patch fixes Mitch's bug by checking for copied =3D=3D 0 afte= r >>> > > btrfs_copy_from_user and going the correct delalloc accounting.= =A0This >>> > > one looks solved, but you'll notice the patch is bigger. >>> > > >>> > > First, I add some random failures to btrfs_copy_from_user() by = failing >>> > > everyone once and a while. =A0This was much more reliable than = trying to >>> > > use memory pressure than making copy_from_user fail. >>> > > >>> > > If copy_from_user fails and we partially update a page, we end = up with a >>> > > page that may go away due to memory pressure. =A0But, btrfs_fil= e_write >>> > > assumes that only the first and last page may have good data th= at needs >>> > > to be read off the disk. >>> > > >>> > > This patch ditches that code and puts it into prepare_pages ins= tead. >>> > > But I'm still having some errors during long stress.sh runs. =A0= Ideas are >>> > > more than welcome, hopefully some other timezones will kick in = ideas >>> > > while I sleep. >>> > >>> > At least it doesn't fix the emerge-problem for me. The behavior i= s now the same >>> > as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcry= pt' with no >>> > further interaction to get the emerge-process hang with a svn-pro= cess >>> > consuming 100% CPU. I can cancel the emerge-process with ctrl-c b= ut the >>> > spawned svn-process stays and it needs a reboot to get rid of it. >>> >>> Can you cat /proc/$pid/wchan a few times so we can get an idea of w= here it's >>> looping? =A0Thanks, >>> >>> Josef >> >> It behaves the same way here with btrfs-unstable. >> The output of "cat /proc/$pid/wchan" is 0. >> >> // Maria >> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btr= fs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm= l >>> >> >> >> > > I've applied the patch at the head of this thread (with the jiffies > debugging commented out) and I'm attaching a ftrace using the > function_graph tracer when I'm stuck in the loop. =A0I've just snippe= d > out a couple of the loops (the full trace file is quite large, and > mostly repititious). > > I'm going to try to modify file.c with some trace_printk debugging to > show the values of several of the relevant variables at various > stages. > > I'm going to try to exit the loop after 256 tries with an EFAULT so I > can stop the tracing at that point and capture a trace of the entry > into the problem (the ftrace ring buffer fills up too fast for me to > capture the entry point). > As promised, I'm put together a modified file.c with many trace_printk debugging statements to augment the ftrace. The trace is ~128K compressed (about 31,600 lines or 2.6MB uncompressed), so I'm putting it up on my local server instead of attaching. Let me know if it would be more appropriate to send to the list as an attachment. http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz I preface all my trace_printk comments with "TPK:" to make skipping through the trace easier. The trace contains the trace of about 3 or 4 successful passes through the btrfs_file_aio_write() function to show what a successful trace looks like. The pass through the btrfs_file_aio_write() that breaks begins on line = 1088. I let it loop through the while (iov_iter_count(&i) > 0) {} loop for 256 times when copied=3D=3D0 (otherwise it would loop infinitely). The= n exit out and stop the trace. =46or reference, here's a diff file for the debugging statements I've added to file.c: Let me know if you would like me to re-run this trial with different debugging lines. fs/btrfs/file.c /usr/src/linux/fs/btrfs/file.c --- fs/btrfs/file.c 2011-02-28 10:13:45.000000000 -0600 +++ /usr/src/linux/fs/btrfs/file.c 2011-02-28 13:07:11.000000000 -0600 @@ -53,12 +53,14 @@ int offset =3D pos & (PAGE_CACHE_SIZE - 1); int total_copied =3D 0; + /*************************** if ((jiffies % 10) =3D=3D 0) return 0; if ((jiffies % 25) =3D=3D 0) { write_bytes /=3D 2; } + **************************/ while (write_bytes > 0) { size_t count =3D min_t(size_t, @@ -82,10 +84,13 @@ /* Return to btrfs_file_aio_write to fault page */ if (unlikely(copied =3D=3D 0)) { + trace_printk("TPK: unlikely copied =3D=3D 0 in btrfs_copy_from_user (total_copied=3D%i)\n", + total_copied); break; } if (unlikely(copied < PAGE_CACHE_SIZE - offset)) { + trace_printk("TPK: unlikely copied < PAGE_CACHE_SIZE - offset in btrfs_copy_from_user\n"); offset +=3D copied; } else { pg++; @@ -910,8 +915,13 @@ int will_write; int buffered =3D 0; int copied =3D 0; + int copied_problem =3D 0; + int copied_loop_count =3D 0; + int stop_ftrace =3D 0; int dirty_pages =3D 0; + trace_printk("TPK: Entering btrfs_file_aio_write()\n"); + will_write =3D ((file->f_flags & O_DSYNC) || IS_SYNC(inode) || (file->f_flags & O_DIRECT)); @@ -953,6 +963,7 @@ BTRFS_I(inode)->sequence++; if (unlikely(file->f_flags & O_DIRECT)) { + trace_printk("TPK: transferred to unlikely(file->f_flags \& O_DIRECT= )\n"); num_written =3D generic_file_direct_write(iocb, iov, &nr_segs, pos, ppos, count, ocount); @@ -984,6 +995,8 @@ */ buffered =3D 1; pos +=3D num_written; + trace_printk("TPK: end unlikely(file->f_flags \& O_DIRECT) with num_written=3D%i\n", + num_written); } iov_iter_init(&i, iov, nr_segs, count, num_written); @@ -1003,6 +1016,8 @@ last_index =3D (pos + iov_iter_count(&i)) >> PAGE_CACHE_SHIFT; while (iov_iter_count(&i) > 0) { + trace_printk("TPK: start section while (iov_iter_count() > 0)\n"); + size_t offset =3D pos & (PAGE_CACHE_SIZE - 1); size_t write_bytes =3D min(iov_iter_count(&i), nrptrs * (size_t)PAGE_CACHE_SIZE - @@ -1010,6 +1025,9 @@ size_t num_pages =3D (write_bytes + offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; + trace_printk("TPK: iov_iter_count()=3D%i || nr_segs=3D%lu || nrptrs=3D= %i || offset=3D%lu || write_bytes=3D%lu || num_pages=3D%lu\n", + iov_iter_count(&i), nr_segs, nrptrs, offset, write_bytes, num_pa= ges); + WARN_ON(num_pages > nrptrs); memset(pages, 0, sizeof(struct page *) * nrptrs); @@ -1022,6 +1040,19 @@ goto out; } + if (unlikely(copied_problem)) { + copied_loop_count++; + trace_printk("TPK: copied problem(%i)\n", + copied_loop_count); + /* Give up if we've already tried 256 times */ + if (copied_loop_count > 256) { + ret =3D -EFAULT; + stop_ftrace =3D 1; + trace_printk("TPK: copied loop count exceeded, returning EFAULT...= =2E\n"); + goto out; + } + } + ret =3D btrfs_delalloc_reserve_space(inode, num_pages << PAGE_CACHE_SHIFT); if (ret) @@ -1045,6 +1076,10 @@ PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; + if (copied =3D=3D 0) { + copied_problem =3D 1; + } + if (num_pages > dirty_pages) { if (copied > 0) atomic_inc( @@ -1080,11 +1115,19 @@ num_written +=3D copied; cond_resched(); + trace_printk("TPK: end section while (iov_iter_count() > 0)\n"); + trace_printk(" copied=3D%i || dirty_pages=3D%i || num_written=3D%i |= | ending iov_iter_count=3D%i\n", + copied, dirty_pages, num_written, iov_iter_count(&i) ); } out: + trace_printk("TPK: Reached: out:\n"); + mutex_unlock(&inode->i_mutex); - if (ret) + if (ret) { err =3D ret; + trace_printk("TPK: ret,err had value %i when out: was reached (num_written: %i)\n", + ret, num_written); + } kfree(pages); *ppos =3D pos; @@ -1140,6 +1183,19 @@ } done: current->backing_dev_info =3D NULL; + if (ret) { + trace_printk("TPK: btrfs_file_aio_write exiting with non-zero ret value (%i)\n", ret); + trace_printk("TPK: Returning num_written (%i) ? num_written (%i) : err (%i) =3D %i\n", + num_written, num_written, err, num_written ? num_written : err); + } else { + trace_printk("TPK: btrfs_file_aio_write exiting normally with (%i)", + num_written ? num_written : err); + } +=09 + if (unlikely(stop_ftrace)) { + tracing_off(); + } + return num_written ? num_written : err; } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-01 11:56 ` Zhong, Xin @ 2011-03-01 14:54 ` Mitch Harder 0 siblings, 0 replies; 40+ messages in thread From: Mitch Harder @ 2011-03-01 14:54 UTC (permalink / raw) To: Zhong, Xin Cc: Maria Wikström, Josef Bacik, Johannes Hirte, Chris Mason, linux-btrfs@vger.kernel.org On Tue, Mar 1, 2011 at 5:56 AM, Zhong, Xin <xin.zhong@intel.com> wrote: > Hi Mitch, > > I suspect there's a lock contention between flush-btrfs (lock_dellall= oc_pages) and btrfs_file_aio_write. However I can not recreate it local= ly. Could you please try below patch? Thanks! > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 65338a1..b9d0929= 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1007,17 +1007,16 @@ static ssize_t btrfs_file_aio_write(struct ki= ocb *iocb, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto out; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D btrfs_delalloc_reserve_space(in= ode, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 num_pages << PAGE_CACHE_SHIFT); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ret) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out; > - > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D prepare_pages(root, file, page= s, num_pages, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= pos, first_index, last_index, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= write_bytes); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ret) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 btrfs_delalloc_release_= space(inode, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ret) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D btrfs_delalloc_reserve_space(in= ode, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0num_pages << PAGE_CACHE_SHIFT); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ret) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 btrfs_drop_pages(pages,= num_pages); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto out; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > > Thanks. I've tested this patch, but the build is still failing at the same point as before. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-03-01 10:14 ` Zhong, Xin 2011-03-01 11:56 ` Zhong, Xin @ 2011-03-01 14:51 ` Mitch Harder 1 sibling, 0 replies; 40+ messages in thread From: Mitch Harder @ 2011-03-01 14:51 UTC (permalink / raw) To: Zhong, Xin Cc: Maria Wikström, Josef Bacik, Johannes Hirte, Chris Mason, linux-btrfs@vger.kernel.org On Tue, Mar 1, 2011 at 4:14 AM, Zhong, Xin <xin.zhong@intel.com> wrote: > Is your system running out of memory or is there any other thread like flush-btrfs competing for the same page? > There's no sign of memory pressure. Although I only have 1 GB in this box, I'm still show ~1/2 GB RAM free during this build. There's no swap space allocated, and nothing in dmesg that indicates there's a transient spike of RAM pressure. > I can only see one process in your ftrace log. You may need to trace all btrfs.ko function calls instead of a single process. Thanks! > That ftrace.log was run with ftrace defaults for a function trace. It should collect calls from the whole system. For the sake of consistency, I am intentionally trying to insure that very few other things are going on at the same time as this build. And I'm building with "-j1" so things will happen the same way each time. Also, I supplied just the tail end of the trace log. The full log shows a few of the other build processes leading up to the problem, but the ftrace ring buffer fills up surprisingly fast. Even with a 50MB ring buffer for ftrace, I usually collect less than 1 second of information when something busy like a build is going on. Let me know if you'd like to see the full log. It's bigger, but I can find someplace to put it. But I'm pretty sure that wmldbcreate is the only thing that is going on when the breakage occurs. Otherwise I wouldn't get such consistent breakage in the same spot every time. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] btrfs file write debugging patch 2011-02-28 20:20 ` Mitch Harder 2011-03-01 5:09 ` Mitch Harder 2011-03-01 10:14 ` Zhong, Xin @ 2011-03-01 21:56 ` Piotr Szymaniak 2 siblings, 0 replies; 40+ messages in thread From: Piotr Szymaniak @ 2011-03-01 21:56 UTC (permalink / raw) To: Mitch Harder Cc: Josef Bacik, Johannes Hirte, Chris Mason, Zhong, Xin, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 836 bytes --] On Mon, Feb 28, 2011 at 02:20:22PM -0600, Mitch Harder wrote: > As promised, I'm put together a modified file.c with many trace_printk > debugging statements to augment the ftrace. > *snip* Just my few cents. I've applied the patch from Chris Mason (Sun, 27 Feb 2011 20:46:05 -0500) and this one from Mitch (Mon, 28 Feb 2011 14:20:22 -0600) on top of vanilla 2.6.38-rc6 and it seems that it resolves my issues with hanging `svn info' during libgcrypt emerge. Piotr Szymaniak. -- - (...) Nie wyobrazam sobie, co ta gora miesa moglaby ci dac, czego ja nie moglbym ofiarowac. Oczywiscie poza piecdziesiecioma funtami rozrosnietych miesni. - Moze mnie wlasnie pociagaja rozrosniete miesnie. (...) W koncu wielu mezczyzn pociaga rozrosnieta tkanka tluszczowa piersi. -- Graham Masterton, "The Wells of Hell" [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2011-03-08 2:51 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-01 16:36 [PATCH] btrfs file write debugging patch Xin Zhong 2011-03-01 21:09 ` Mitch Harder 2011-03-02 10:58 ` Zhong, Xin 2011-03-02 14:00 ` Xin Zhong 2011-03-04 1:51 ` Chris Mason 2011-03-04 2:32 ` Josef Bacik 2011-03-04 2:42 ` Zhong, Xin 2011-03-04 2:41 ` Josef Bacik 2011-03-04 8:41 ` Zhong, Xin 2011-03-05 16:56 ` Mitch Harder 2011-03-05 17:28 ` Xin Zhong 2011-03-04 12:19 ` Chris Mason 2011-03-04 14:25 ` Xin Zhong 2011-03-04 15:33 ` Mitch Harder 2011-03-04 17:21 ` Mitch Harder 2011-03-05 1:00 ` Xin Zhong 2011-03-05 13:14 ` Mitch Harder 2011-03-05 16:50 ` Mitch Harder 2011-03-06 18:00 ` Chris Mason 2011-03-07 0:58 ` Chris Mason 2011-03-07 6:07 ` Mitch Harder 2011-03-07 6:37 ` Zhong, Xin 2011-03-07 19:56 ` Maria Wikström 2011-03-07 22:12 ` Johannes Hirte 2011-03-08 2:51 ` Zhong, Xin -- strict thread matches above, loose matches on Subject: below -- 2011-02-25 18:43 [PATCH v2]Btrfs: pwrite blocked when writing from the mmaped buffer of the same page Mitch Harder 2011-02-28 1:46 ` [PATCH] btrfs file write debugging patch Chris Mason 2011-02-28 8:56 ` Zhong, Xin 2011-02-28 14:02 ` Chris Mason 2011-02-28 10:13 ` Johannes Hirte 2011-02-28 14:00 ` Chris Mason 2011-02-28 16:10 ` Josef Bacik 2011-02-28 16:45 ` Maria Wikström 2011-02-28 17:47 ` Mitch Harder 2011-02-28 20:20 ` Mitch Harder 2011-03-01 5:09 ` Mitch Harder 2011-03-01 10:14 ` Zhong, Xin 2011-03-01 11:56 ` Zhong, Xin 2011-03-01 14:54 ` Mitch Harder 2011-03-01 14:51 ` Mitch Harder 2011-03-01 21:56 ` Piotr Szymaniak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).