From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Ming Lin <mlin@kernel.org>, Christoph Hellwig <hch@lst.de>,
linux-xfs@vger.kernel.org,
Ceph Development <ceph-devel@vger.kernel.org>,
"LIU, Fei" <james.liu@alibaba-inc.com>,
xiongwei.jiang@alibaba-inc.com, boqian.zy@alibaba-inc.com,
sheng.qiu@alibaba-inc.com, Sage Weil <sweil@redhat.com>
Subject: Re: xlog_write: reservation ran out
Date: Wed, 3 May 2017 10:10:32 -0400 [thread overview]
Message-ID: <20170503141030.GA11549@bfoster.bfoster> (raw)
In-Reply-To: <20170502235152.GN12369@dastard>
On Wed, May 03, 2017 at 09:51:52AM +1000, Dave Chinner wrote:
> On Mon, May 01, 2017 at 10:41:06AM -0700, Ming Lin wrote:
> > On 5/1/2017 6:12 AM, Brian Foster wrote:
> > > On Sun, Apr 30, 2017 at 11:10:15PM -0700, Ming Lin wrote:
> > >>
> > >> On 4/28/2017 1:56 PM, Ming Lin wrote:
> > >>> I'm new to xfs code.
> > >>>
> > >>> Search XFS_TRANS_INACTIVE and the usage is like below,
> > >>>
> > >>> xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> > >>> xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> > >>>
> > >>> xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> > >>> xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree, XFS_IFREE_SPACE_RES(mp), 0);
> > >>>
> > >>> seems tr_remove is not related.
> > >>> I'll just try to enlarge the reservation for tr_itruncate and tr_ifree.
> > >>
> > >> Now things are a little bit more clear. I tried below debug patch.
> > >> The t_decrease[] array was used to track where the space was used.
> > >>
> > >> fs/xfs/libxfs/xfs_trans_resv.c | 4 ++--
> > >> fs/xfs/xfs_log.c | 23 ++++++++++++++++++++---
> > >> fs/xfs/xfs_log_cil.c | 8 ++++++++
> > >> fs/xfs/xfs_log_priv.h | 3 +++
> > >> fs/xfs/xfs_super.c | 1 +
> > >> 5 files changed, 34 insertions(+), 5 deletions(-)
> > >>
> > > ...
> > >> 277 static void
> > >> 278 xlog_cil_insert_items(
> > >> 279 struct xlog *log,
> > >> 280 struct xfs_trans *tp)
> > >> 281 {
> > >>
> > >> ....
> > >>
> > >> 340 /* do we need space for more log record headers? */
> > >> 341 iclog_space = log->l_iclog_size - log->l_iclog_hsize;
> > >> 342 if (len > 0 && (ctx->space_used / iclog_space !=
> > >> 343 (ctx->space_used + len) / iclog_space)) {
> > >> 344 int hdrs;
> > >> 345
> > >> 346 hdrs = (len + iclog_space - 1) / iclog_space;
> > >> 347 /* need to take into account split region headers, too */
> > >> 348 hdrs *= log->l_iclog_hsize + sizeof(struct xlog_op_header);
> > >> 349 ctx->ticket->t_unit_res += hdrs;
> > >> 350 ctx->ticket->t_curr_res += hdrs;
> > >> 351 tp->t_ticket->t_curr_res -= hdrs;
> > >> 352 tp->t_ticket->t_decrease[6] = hdrs;
> > >> 353 ASSERT(tp->t_ticket->t_curr_res >= len);
> > >> 354 }
> > >> 355 tp->t_ticket->t_curr_res -= len;
> > >> 356 tp->t_ticket->t_decrease[7] = len;
> > >> 357 ctx->space_used += len;
> > >> 358
> > >> 359 spin_unlock(&cil->xc_cil_lock);
> > >> 360 }
> > >>
> > >> Any idea why it used so many reservation space here?
> > >>
> > >
> > > Nothing really rings a bell for me atm. Perhaps others might have ideas.
> > > That does appear to be a sizable overrun, as opposed to a few bytes that
> > > could more likely be attributed to rounding, header accounting issues or
> > > something of that nature.
> >
> > FYI, here are some numbers.
> >
> > The original "unit res" is 83024. I made it x2 larger, so now it's 166048
> > "unit res" - "current res" = the reservation space already used
> >
> > XFS (nvme10n1p1): xlog_write: reservation summary:
> > trans type = INACTIVE (3)
> > unit res = 166048 bytes
> > current res = 77088 bytes
> > total reg = 0 bytes (o/flow = 0 bytes)
> > ophdrs = 0 (ophdr space = 0 bytes)
> > ophdr + reg = 0 bytes
> > num regions = 0
>
> So the problem here is that the ticket summary is being printed
> after everything useful has been removed from the ticket and moved
> to the CIL. i.e. for more useful debug, this overrun condition
> needs to be checked between the xlog_cil_insert_format_items() call
> and moving the items off the transaction in xfs_log_commit_cil().
>
> That way xlog_print_tic_res() will actually tell us how many regions
> are being logged and give us a much better idea of the scope of
> changes logged in the transaction that lead to the overrun.
>
Hmm.. interesting idea, but I'm not sure that just moving the ticket
summary prior to the CIL insertion tells us much. For one, this is a
transaction ticket and afaict the region count bits are only really used
for a CIL ticket, unless I'm missing something..? IOWs, it seems to me
that if we had such pre-CIL format overrun detection, we'd probably want
to add new code to dump information about what is attached to the
transaction so we can analyze what was logged against the actual
reservation.
Assuming we add such code, we'd still need to actually detect the
overrun before doing the insert, which is currently where we calculate
the consumed transaction reservation (i.e., in the ->iop_format()
callouts). I guess we could look into refactoring that, though I fear
that would get ugly rather quickly. I'm also not sure how deterministic
we can be given some of the additional drains on the transaction ticket
reservation from xlog_cil_insert_items(). Perhaps we could assume worst
case in that regard?
I note that shadow buffer allocation already processes the size of each
item. I wonder if we could just use that to perform a semi-lazy
pre-insert check to see if we've already blown the reservation before
the CIL has started to process the tp (leaving the post-insert check in
place, as well). For example, update xlog_cil_alloc_shadow_bufs() to
return a total 'nbytes' from the dirty tp items and use that value in
the caller. Hmm, thinking more about it, I suppose we could just add
another similar helper since we'd presumably want to process the list of
items again if the reservation is overrun. Thoughts?
Brian
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-05-03 14:10 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAF1ivSZeYhnzh6CPZViYUNFXdm33hhTA_CzP7V745QociEEnKg@mail.gmail.com>
2017-04-28 20:24 ` xlog_write: reservation ran out Ming Lin
2017-04-28 20:56 ` Ming Lin
2017-05-01 6:10 ` Ming Lin
2017-05-01 13:12 ` Brian Foster
2017-05-01 17:41 ` Ming Lin
2017-05-01 18:48 ` Brian Foster
2017-05-01 20:18 ` Ming Lin
2017-05-02 23:51 ` Dave Chinner
2017-05-03 14:10 ` Brian Foster [this message]
2017-05-03 15:53 ` Darrick J. Wong
2017-05-03 16:15 ` Brian Foster
2017-05-04 0:37 ` Dave Chinner
2017-05-04 0:34 ` Dave Chinner
2017-05-04 11:51 ` Brian Foster
2017-05-05 3:16 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170503141030.GA11549@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=boqian.zy@alibaba-inc.com \
--cc=ceph-devel@vger.kernel.org \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=james.liu@alibaba-inc.com \
--cc=linux-xfs@vger.kernel.org \
--cc=mlin@kernel.org \
--cc=sheng.qiu@alibaba-inc.com \
--cc=sweil@redhat.com \
--cc=xiongwei.jiang@alibaba-inc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).