Re: PLEASE TEST: Everybody who is seeing weird and long hangs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chris Mason <chris.mason@oracle.com>
To: cwillu <cwillu@cwillu.com>
Cc: Josef Bacik <josef@redhat.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: PLEASE TEST: Everybody who is seeing weird and long hangs
Date: Mon, 01 Aug 2011 20:09:01 -0400	[thread overview]
Message-ID: <1312243693-sup-7056@shiny> (raw)
In-Reply-To: <CAE5mzviteeoNF840ef8_zgVk5ZfX=tgt_k7EVOeEvD47jWeBug@mail.gmail.com>

Excerpts from cwillu's message of 2011-08-01 19:28:35 -0400:
> On Mon, Aug 1, 2011 at 12:21 PM, Chris Mason <chris.mason@oracle.com>=
 wrote:
> > Excerpts from Josef Bacik's message of 2011-08-01 14:01:35 -0400:
> >> On 08/01/2011 01:54 PM, Chris Mason wrote:
> >> > Excerpts from Josef Bacik's message of 2011-08-01 12:03:34 -0400=
:
> >> >> On 08/01/2011 11:45 AM, Chris Mason wrote:
> >> >>> Excerpts from Josef Bacik's message of 2011-08-01 11:21:34 -04=
00:
> >> >>>> Hello,
> >> >>>>
> >> >>>> We've seen a lot of reports of people having these constant l=
ong pauses
> >> >>>> when doing things like sync or such. =C2=A0The stack traces u=
sually all look
> >> >>>> the same, one is btrfs-transaction stuck in btrfs_wait_marked=
_extents
> >> >>>> and one is btrfs-submit-# stuck in get_request_wait. =C2=A0I =
had originally
> >> >>>> thought this was due to the new plugging stuff, but I think i=
t just
> >> >>>> makes the problem happen more quickly as we've seen that 2.6.=
38 which we
> >> >>>> thought was ok will still have the problem happen if given en=
ough time.
> >> >>>>
> >> >>>> I _think_ this is because of the way we write out metadata in=
 the
> >> >>>> transaction commit phase. =C2=A0We're doing write_on_page for=
 every dirty
> >> >>>> page in the btree during the commit. =C2=A0This sucks because=
 basically we
> >> >>>> end up with one bio per page, which makes us blow out our nr_=
requests
> >> >>>> constantly, which is why btrfs-submit-# is always stuck in
> >> >>>> get_request_wait. =C2=A0What we need to do instead is use fil=
emap_fdatawrite
> >> >>>> which will do a WB_SYNC_ALL but will do it via writepages, so=
 hopefully
> >> >>>> we will get less bios and this problem will go away. =C2=A0Pl=
ease try this
> >> >>>> very hastily put together patch if you are experiencing this =
problem and
> >> >>>> let me know if it fixes it for you. =C2=A0Thanks,
> >> >>>
> >> >>> I'm definitely curious to hear if this helps, but I think it m=
ight cause
> >> >>> a different set of problems. =C2=A0It writes everything that i=
s dirty on the
> >> >>> btree, which includes a lot of things we've cow'd in the curre=
nt
> >> >>> transaction and marked dirty. =C2=A0They will have to go throu=
gh COW again
> >> >>> if someone wants to modify them again.
> >> >>>
> >> >>
> >> >> But this is happening in the commit after we've done all of our=
 work, we
> >> >> shouldn't be dirtying anything else at this point right?
> >> >
> >> > The commit code is setup to unblock people before we start the I=
O:
> >> >
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0trans->transaction->blocked =3D 0;
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_lock(&root->fs_info->trans_lock=
);
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 root->fs_info->running_transaction =3D=
 NULL;
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 root->fs_info->trans_no_join =3D 0;
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_unlock(&root->fs_info->trans_lo=
ck);
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 mutex_unlock(&root->fs_info->reloc_m=
utex);
> >> >
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 wake_up(&root->fs_info->transaction_=
wait);
> >> >
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D btrfs_write_and_wait_transac=
tion(trans, root);
> >> >
> >> > So, we should have concurrent FS mods for a new transaction whil=
e we are
> >> > writing out this old transaction.
> >> >
> >>
> >> Ah right, but then this brings up another question, we shouldn't c=
ow
> >> them again since we would have set the new transid. =C2=A0And isn'=
t this kind
> >> of bad, since somebody could come in and dirty a piece of metadata
> >> before we have a chance to write it out for this transaction, so w=
e end
> >> up writing out the new data instead of what we are trying to commi=
t?
> >
> > I think we're mixing together different ideas here. =C2=A0If we're =
doing a
> > commit on transaction N, we allow N+1 to start while we're doing th=
e
> > btrfs_write_and_wait_transaction(). =C2=A0N+1 might allocate and di=
rty a new
> > block, which btrfs_write_and_wait_transaction might start IO on.
> >
> > Strictly speaking this isn't a problem. =C2=A0It doesn't break any =
rules of
> > COW because we're allowed to write metadata at any time. =C2=A0But,=
 once we
> > do write it, we must COW it again if we want to change it. =C2=A0So=
, anything
> > that btrfs_write_and_wait_transaction() catches from transaction N+=
1 is
> > likely to make more work for us because future mods will have to
> > allocate a new block. =C2=A0Basically it's wasted IO.
> >
> > But, it's also free IO, assuming it was contiguous. =C2=A0The probl=
em is that
> > write_cache_pages isn't actually making sure it was contiguous, so =
we
> > end up doing many more writes than we could have.
>=20
> First user ("youagree") reported back on irc:
>=20
> <youagree> guys, just came to report its much worse with josef's patc=
h
> <youagree> now i can hardly start anything, it's slowed down most of =
the time

Josef's filemap_fdatawrite patch?  He sent a second one to the list tha=
t
gets rid of the extra IO done by the current code.  That's the one we
hope will fix things.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

     prev parent reply	other threads:[~2011-08-02  0:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-01 15:21 PLEASE TEST: Everybody who is seeing weird and long hangs Josef Bacik
2011-08-01 15:45 ` Chris Mason
2011-08-01 16:03   ` Josef Bacik
2011-08-01 17:54     ` Chris Mason
2011-08-01 18:01       ` Josef Bacik
2011-08-01 18:21         ` Chris Mason
2011-08-01 23:28           ` cwillu
2011-08-02  0:09             ` Chris Mason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1312243693-sup-7056@shiny \
    --to=chris.mason@oracle.com \
    --cc=cwillu@cwillu.com \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.