From: Chris Mason <chris.mason@oracle.com>
To: "Zhong, Xin" <xin.zhong@intel.com>
Cc: "Mitch Harder" <mitch.harder@sabayonlinux.org>,
"Maria Wikström" <maria@ponstudios.se>,
"Johannes Hirte" <johannes.hirte@fem.tu-ilmenau.de>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: RE: [PATCH] btrfs file write debugging patch
Date: Mon, 28 Feb 2011 09:02:38 -0500 [thread overview]
Message-ID: <1298901635-sup-9740@think> (raw)
In-Reply-To: <1865303E0DED764181A9D882DEF65FB68662A5E3F3@shsmsx502.ccr.corp.intel.com>
Excerpts from Zhong, Xin's message of 2011-02-28 03:56:40 -0500:
> One possible issue I can see is in the random failure case #2 that co=
py_from_user only process half of the data.=20
>=20
> For example, if it write a 4k aligned page and copy_from_user only wr=
ite 2k. Then it will not call btrfs_delalloc_release_space since num_pa=
ges and dirty_pages are both 1.=20
> In the next round, it write another 2k and btrfs_delalloc_reserve_spa=
ce is called twice for the same page.=20
>=20
> Is it a problem? Thanks!
It should be the correct answer. The result of the short copy_from_use=
r
should be exactly the same as two write calls where one does 2K and the
other does another 2K.
Either way, it shouldn't result in incorrect bytes in the file, which i=
s
still happening for me with the debugging hunks in place.
-chris
>=20
> -----Original Message-----
> From: Chris Mason [mailto:chris.mason@oracle.com]=20
> Sent: Monday, February 28, 2011 9:46 AM
> To: Mitch Harder
> Cc: Maria Wikstr=C3=B6m; Zhong, Xin; Johannes Hirte; linux-btrfs@vger=
=2Ekernel.org
> Subject: [PATCH] btrfs file write debugging patch
>=20
> Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
> > Some clarification on my previous message...
> >=20
> > After looking at my ftrace log more closely, I can see where Btrfs =
is
> > trying to release the allocated pages. However, the calculation fo=
r
> > the number of dirty_pages is equal to 1 when "copied =3D=3D 0".
> >=20
> > So I'm seeing at least two problems:
> > (1) It keeps looping when "copied =3D=3D 0".
> > (2) One dirty page is not being released on every loop even though
> > "copied =3D=3D 0" (at least this problem keeps it from being an inf=
inite
> > loop by eventually exhausting reserveable space on the disk).
>=20
> Hi everyone,
>=20
> There are actually tow bugs here. First the one that Mitch hit, and =
a
> second one that still results in bad file_write results with my
> debugging hunks (the first two hunks below) in place.
>=20
> My patch fixes Mitch's bug by checking for copied =3D=3D 0 after
> btrfs_copy_from_user and going the correct delalloc accounting. This
> one looks solved, but you'll notice the patch is bigger.
>=20
> First, I add some random failures to btrfs_copy_from_user() by failin=
g
> everyone once and a while. This was much more reliable than trying t=
o
> use memory pressure than making copy_from_user fail.
>=20
> If copy_from_user fails and we partially update a page, we end up wit=
h a
> page that may go away due to memory pressure. But, btrfs_file_write
> assumes that only the first and last page may have good data that nee=
ds
> to be read off the disk.
>=20
> This patch ditches that code and puts it into prepare_pages instead.
> But I'm still having some errors during long stress.sh runs. Ideas a=
re
> more than welcome, hopefully some other timezones will kick in ideas
> while I sleep.
>=20
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 7084140..89a6a26 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -54,6 +54,13 @@ static noinline int btrfs_copy_from_user(loff_t po=
s, int num_pages,
> int offset =3D pos & (PAGE_CACHE_SIZE - 1);
> int total_copied =3D 0;
> =20
> + if ((jiffies % 10) =3D=3D 0)
> + return 0;
> +
> + if ((jiffies % 25) =3D=3D 0) {
> + write_bytes /=3D 2;
> + }
> +
> while (write_bytes > 0) {
> size_t count =3D min_t(size_t,
> PAGE_CACHE_SIZE - offset, write_bytes);
> @@ -763,6 +770,26 @@ out:
> }
> =20
> /*
> + * on error we return an unlocked page and the error value
> + * on success we return a locked page and 0
> + */
> +static int prepare_uptodate_page(struct page *page, u64 pos)
> +{
> + int ret =3D 0;
> + if ((pos & (PAGE_CACHE_SIZE - 1)) && !PageUptodate(page)) {
> + ret =3D btrfs_readpage(NULL, page);
> + if (ret)
> + return ret;
> + lock_page(page);
> + if (!PageUptodate(page)) {
> + unlock_page(page);
> + return -EIO;
> + }
> + }
> + return 0;
> +}
> +
> +/*
> * this gets pages into the page cache and locks them down, it also =
properly
> * waits for data=3Dordered extents to finish before allowing the pa=
ges to be
> * modified.
> @@ -777,6 +804,7 @@ static noinline int prepare_pages(struct btrfs_ro=
ot *root, struct file *file,
> unsigned long index =3D pos >> PAGE_CACHE_SHIFT;
> struct inode *inode =3D fdentry(file)->d_inode;
> int err =3D 0;
> + int faili =3D 0;
> u64 start_pos;
> u64 last_pos;
> =20
> @@ -794,15 +822,24 @@ again:
> for (i =3D 0; i < num_pages; i++) {
> pages[i] =3D grab_cache_page(inode->i_mapping, index + i);
> if (!pages[i]) {
> - int c;
> - for (c =3D i - 1; c >=3D 0; c--) {
> - unlock_page(pages[c]);
> - page_cache_release(pages[c]);
> - }
> - return -ENOMEM;
> + faili =3D i - 1;
> + err =3D -ENOMEM;
> + goto fail;
> + }
> +
> + if (i =3D=3D 0)
> + err =3D prepare_uptodate_page(pages[i], pos);
> + else if (i =3D=3D num_pages - 1)
> + err =3D prepare_uptodate_page(pages[i],
> + pos + write_bytes);
> + if (err) {
> + page_cache_release(pages[i]);
> + faili =3D i - 1;
> + goto fail;
> }
> wait_on_page_writeback(pages[i]);
> }
> + err =3D 0;
> if (start_pos < inode->i_size) {
> struct btrfs_ordered_extent *ordered;
> lock_extent_bits(&BTRFS_I(inode)->io_tree,
> @@ -842,6 +879,14 @@ again:
> WARN_ON(!PageLocked(pages[i]));
> }
> return 0;
> +fail:
> + while (faili >=3D 0) {
> + unlock_page(pages[faili]);
> + page_cache_release(pages[faili]);
> + faili--;
> + }
> + return err;
> +
> }
> =20
> static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
> @@ -851,7 +896,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb =
*iocb,
> struct file *file =3D iocb->ki_filp;
> struct inode *inode =3D fdentry(file)->d_inode;
> struct btrfs_root *root =3D BTRFS_I(inode)->root;
> - struct page *pinned[2];
> struct page **pages =3D NULL;
> struct iov_iter i;
> loff_t *ppos =3D &iocb->ki_pos;
> @@ -872,9 +916,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb =
*iocb,
> will_write =3D ((file->f_flags & O_DSYNC) || IS_SYNC(inode) ||
> (file->f_flags & O_DIRECT));
> =20
> - pinned[0] =3D NULL;
> - pinned[1] =3D NULL;
> -
> start_pos =3D pos;
> =20
> vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> @@ -962,32 +1003,6 @@ static ssize_t btrfs_file_aio_write(struct kioc=
b *iocb,
> first_index =3D pos >> PAGE_CACHE_SHIFT;
> last_index =3D (pos + iov_iter_count(&i)) >> PAGE_CACHE_SHIFT;
> =20
> - /*
> - * there are lots of better ways to do this, but this code
> - * makes sure the first and last page in the file range are
> - * up to date and ready for cow
> - */
> - if ((pos & (PAGE_CACHE_SIZE - 1))) {
> - pinned[0] =3D grab_cache_page(inode->i_mapping, first_index)=
;
> - if (!PageUptodate(pinned[0])) {
> - ret =3D btrfs_readpage(NULL, pinned[0]);
> - BUG_ON(ret);
> - wait_on_page_locked(pinned[0]);
> - } else {
> - unlock_page(pinned[0]);
> - }
> - }
> - if ((pos + iov_iter_count(&i)) & (PAGE_CACHE_SIZE - 1)) {
> - pinned[1] =3D grab_cache_page(inode->i_mapping, last_index);
> - if (!PageUptodate(pinned[1])) {
> - ret =3D btrfs_readpage(NULL, pinned[1]);
> - BUG_ON(ret);
> - wait_on_page_locked(pinned[1]);
> - } else {
> - unlock_page(pinned[1]);
> - }
> - }
> -
> while (iov_iter_count(&i) > 0) {
> size_t offset =3D pos & (PAGE_CACHE_SIZE - 1);
> size_t write_bytes =3D min(iov_iter_count(&i),
> @@ -1024,8 +1039,12 @@ static ssize_t btrfs_file_aio_write(struct kio=
cb *iocb,
> =20
> copied =3D btrfs_copy_from_user(pos, num_pages,
> write_bytes, pages, &i);
> - dirty_pages =3D (copied + offset + PAGE_CACHE_SIZE - 1) >>
> - PAGE_CACHE_SHIFT;
> + if (copied =3D=3D 0)
> + dirty_pages =3D 0;
> + else
> + dirty_pages =3D (copied + offset +
> + PAGE_CACHE_SIZE - 1) >>
> + PAGE_CACHE_SHIFT;
> =20
> if (num_pages > dirty_pages) {
> if (copied > 0)
> @@ -1069,10 +1088,6 @@ out:
> err =3D ret;
> =20
> kfree(pages);
> - if (pinned[0])
> - page_cache_release(pinned[0]);
> - if (pinned[1])
> - page_cache_release(pinned[1]);
> *ppos =3D pos;
> =20
> /*
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-02-28 14:02 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-09 9:30 [PATCH v2]Btrfs: pwrite blocked when writing from the mmaped buffer of the same page Zhong, Xin
2011-01-27 13:09 ` Johannes Hirte
2011-01-27 22:12 ` Maria Wikström
2011-01-28 1:26 ` Zhong, Xin
2011-01-28 2:54 ` Johannes Hirte
2011-01-28 3:53 ` Zhong, Xin
2011-02-01 23:34 ` Johannes Hirte
2011-02-11 4:39 ` Zhong, Xin
2011-02-18 11:31 ` Maria Wikström
2011-02-21 1:51 ` Zhong, Xin
2011-02-24 14:51 ` Maria Wikström
2011-02-24 15:55 ` Mitch Harder
2011-02-24 16:00 ` Chris Mason
2011-02-24 16:03 ` Mitch Harder
2011-02-24 16:19 ` Chris Mason
2011-02-24 16:32 ` Mitch Harder
[not found] ` <AANLkTinvyb-bTVVignd1KGojvh-QrYCFmCnwYKBsYC_2@mail.gmail.com>
2011-02-25 17:11 ` Mitch Harder
2011-02-25 18:43 ` Mitch Harder
2011-02-25 19:19 ` Chris Mason
2011-02-28 1:46 ` [PATCH] btrfs file write debugging patch Chris Mason
2011-02-28 8:56 ` Zhong, Xin
2011-02-28 14:02 ` Chris Mason [this message]
2011-02-28 10:13 ` Johannes Hirte
2011-02-28 14:00 ` Chris Mason
2011-02-28 16:10 ` Josef Bacik
2011-02-28 16:45 ` Maria Wikström
2011-02-28 17:47 ` Mitch Harder
2011-02-28 20:20 ` Mitch Harder
2011-03-01 5:09 ` Mitch Harder
2011-03-01 10:14 ` Zhong, Xin
2011-03-01 11:56 ` Zhong, Xin
2011-03-01 14:54 ` Mitch Harder
2011-03-01 14:51 ` Mitch Harder
2011-03-01 21:56 ` Piotr Szymaniak
2011-02-24 23:35 ` [PATCH v2]Btrfs: pwrite blocked when writing from the mmaped buffer of the same page Piotr Szymaniak
2011-02-22 22:27 ` Johannes Hirte
2011-02-23 7:27 ` Zhong, Xin
2011-02-23 21:56 ` Chris Mason
2011-02-23 23:02 ` Johannes Hirte
2011-02-24 15:23 ` Chris Mason
2011-01-28 16:47 ` Maria Wikström
2011-01-28 18:27 ` Rui Miguel Silva
2011-01-29 15:38 ` Maria Wikström
-- strict thread matches above, loose matches on Subject: below --
2011-03-01 16:36 [PATCH] btrfs file write debugging patch Xin Zhong
2011-03-01 21:09 ` Mitch Harder
2011-03-02 10:58 ` Zhong, Xin
2011-03-02 14:00 ` Xin Zhong
2011-03-04 1:51 ` Chris Mason
2011-03-04 2:32 ` Josef Bacik
2011-03-04 2:42 ` Zhong, Xin
2011-03-04 2:41 ` Josef Bacik
2011-03-04 8:41 ` Zhong, Xin
2011-03-05 16:56 ` Mitch Harder
2011-03-05 17:28 ` Xin Zhong
2011-03-04 12:19 ` Chris Mason
2011-03-04 14:25 ` Xin Zhong
2011-03-04 15:33 ` Mitch Harder
2011-03-04 17:21 ` Mitch Harder
2011-03-05 1:00 ` Xin Zhong
2011-03-05 13:14 ` Mitch Harder
2011-03-05 16:50 ` Mitch Harder
2011-03-06 18:00 ` Chris Mason
2011-03-07 0:58 ` Chris Mason
2011-03-07 6:07 ` Mitch Harder
2011-03-07 6:37 ` Zhong, Xin
2011-03-07 19:56 ` Maria Wikström
2011-03-07 22:12 ` Johannes Hirte
2011-03-08 2:51 ` Zhong, Xin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1298901635-sup-9740@think \
--to=chris.mason@oracle.com \
--cc=johannes.hirte@fem.tu-ilmenau.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=maria@ponstudios.se \
--cc=mitch.harder@sabayonlinux.org \
--cc=xin.zhong@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).