From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Ming Lin <mlin@kernel.org>, Christoph Hellwig <hch@lst.de>,
linux-xfs@vger.kernel.org,
Ceph Development <ceph-devel@vger.kernel.org>,
"LIU, Fei" <james.liu@alibaba-inc.com>,
xiongwei.jiang@alibaba-inc.com, boqian.zy@alibaba-inc.com,
sheng.qiu@alibaba-inc.com, Sage Weil <sweil@redhat.com>
Subject: Re: xlog_write: reservation ran out
Date: Thu, 4 May 2017 10:34:35 +1000 [thread overview]
Message-ID: <20170504003435.GC17542@dastard> (raw)
In-Reply-To: <20170503141030.GA11549@bfoster.bfoster>
On Wed, May 03, 2017 at 10:10:32AM -0400, Brian Foster wrote:
> On Wed, May 03, 2017 at 09:51:52AM +1000, Dave Chinner wrote:
> > On Mon, May 01, 2017 at 10:41:06AM -0700, Ming Lin wrote:
> > > On 5/1/2017 6:12 AM, Brian Foster wrote:
> > > > On Sun, Apr 30, 2017 at 11:10:15PM -0700, Ming Lin wrote:
> > > >>
> > > >> On 4/28/2017 1:56 PM, Ming Lin wrote:
> > > >>> I'm new to xfs code.
> > > >>>
> > > >>> Search XFS_TRANS_INACTIVE and the usage is like below,
> > > >>>
> > > >>> xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> > > >>> xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> > > >>>
> > > >>> xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> > > >>> xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree, XFS_IFREE_SPACE_RES(mp), 0);
> > > >>>
> > > >>> seems tr_remove is not related.
> > > >>> I'll just try to enlarge the reservation for tr_itruncate and tr_ifree.
> > > >>
> > > >> Now things are a little bit more clear. I tried below debug patch.
> > > >> The t_decrease[] array was used to track where the space was used.
> > > >>
> > > >> fs/xfs/libxfs/xfs_trans_resv.c | 4 ++--
> > > >> fs/xfs/xfs_log.c | 23 ++++++++++++++++++++---
> > > >> fs/xfs/xfs_log_cil.c | 8 ++++++++
> > > >> fs/xfs/xfs_log_priv.h | 3 +++
> > > >> fs/xfs/xfs_super.c | 1 +
> > > >> 5 files changed, 34 insertions(+), 5 deletions(-)
> > > >>
> > > > ...
> > > >> 277 static void
> > > >> 278 xlog_cil_insert_items(
> > > >> 279 struct xlog *log,
> > > >> 280 struct xfs_trans *tp)
> > > >> 281 {
> > > >>
> > > >> ....
> > > >>
> > > >> 340 /* do we need space for more log record headers? */
> > > >> 341 iclog_space = log->l_iclog_size - log->l_iclog_hsize;
> > > >> 342 if (len > 0 && (ctx->space_used / iclog_space !=
> > > >> 343 (ctx->space_used + len) / iclog_space)) {
> > > >> 344 int hdrs;
> > > >> 345
> > > >> 346 hdrs = (len + iclog_space - 1) / iclog_space;
> > > >> 347 /* need to take into account split region headers, too */
> > > >> 348 hdrs *= log->l_iclog_hsize + sizeof(struct xlog_op_header);
> > > >> 349 ctx->ticket->t_unit_res += hdrs;
> > > >> 350 ctx->ticket->t_curr_res += hdrs;
> > > >> 351 tp->t_ticket->t_curr_res -= hdrs;
> > > >> 352 tp->t_ticket->t_decrease[6] = hdrs;
> > > >> 353 ASSERT(tp->t_ticket->t_curr_res >= len);
> > > >> 354 }
> > > >> 355 tp->t_ticket->t_curr_res -= len;
> > > >> 356 tp->t_ticket->t_decrease[7] = len;
> > > >> 357 ctx->space_used += len;
> > > >> 358
> > > >> 359 spin_unlock(&cil->xc_cil_lock);
> > > >> 360 }
> > > >>
> > > >> Any idea why it used so many reservation space here?
> > > >>
> > > >
> > > > Nothing really rings a bell for me atm. Perhaps others might have ideas.
> > > > That does appear to be a sizable overrun, as opposed to a few bytes that
> > > > could more likely be attributed to rounding, header accounting issues or
> > > > something of that nature.
> > >
> > > FYI, here are some numbers.
> > >
> > > The original "unit res" is 83024. I made it x2 larger, so now it's 166048
> > > "unit res" - "current res" = the reservation space already used
> > >
> > > XFS (nvme10n1p1): xlog_write: reservation summary:
> > > trans type = INACTIVE (3)
> > > unit res = 166048 bytes
> > > current res = 77088 bytes
> > > total reg = 0 bytes (o/flow = 0 bytes)
> > > ophdrs = 0 (ophdr space = 0 bytes)
> > > ophdr + reg = 0 bytes
> > > num regions = 0
> >
> > So the problem here is that the ticket summary is being printed
> > after everything useful has been removed from the ticket and moved
> > to the CIL. i.e. for more useful debug, this overrun condition
> > needs to be checked between the xlog_cil_insert_format_items() call
> > and moving the items off the transaction in xfs_log_commit_cil().
> >
> > That way xlog_print_tic_res() will actually tell us how many regions
> > are being logged and give us a much better idea of the scope of
> > changes logged in the transaction that lead to the overrun.
> >
>
> Hmm.. interesting idea, but I'm not sure that just moving the ticket
> summary prior to the CIL insertion tells us much. For one, this is a
> transaction ticket and afaict the region count bits are only really used
> for a CIL ticket, unless I'm missing something..?
Seems like it - I was talking about dumping the logged regions in the
transaction log items, not the formatted log vectors passed to
xlog_write() from the CIL.
We've got the log items and their vectors still attached to the
transaction pointer after the call to xlog_cil_insert_format_items()
(i.e. on the tp->t_items list), and we know what the formatted
length is going to be and so we can directly check for overrun
before doing anything else. i.e. something like:
....
xlog_cil_insert_format_items(log, tp, &len, &diff_iovecs);
/* account for space used by new iovec headers */
len += diff_iovecs * sizeof(xlog_op_header_t);
/* trivial overrun check */
if (tp->t_ticket->t_curr_res - len < 0) {
/* factor printing into helper */
xfs_warn("transaction reservation overrun <transaction info>")
/* dump transaction/ticket info */
list_for_each_entry(lidp, &tp->t_items, lid_trans) {
struct xfs_log_item *lip = lidp->lid_item;
struct xfs_log_iovec *lv = lip->li_lv;
/* dump log item info */
xfs_warn("Log item type, flags, ...");
/* dump lv array info */
/* dump formatted log vector regions */
for (i = 0; i < lv->lv_niovecs; i++) {
/* dump individual lv info */
/* dump first 32 bytes */
}
}
}
> IOWs, it seems to me
> that if we had such pre-CIL format overrun detection, we'd probably want
> to add new code to dump information about what is attached to the
> transaction so we can analyze what was logged against the actual
> reservation.
Yes, that's exactly what I'm suggesting needs to be done.
Keep in mind this is what xlog_print_tic_res() did prior to the
introduction of delayed logging. i.e. I broke the detailed region
logging of transaction overruns 7 years ago when changing it to
provide detailed logging of CIL ticket reservation overruns, and
this is the first time since then that we've needed transaction
overrun debugging...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-05-04 0:34 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAF1ivSZeYhnzh6CPZViYUNFXdm33hhTA_CzP7V745QociEEnKg@mail.gmail.com>
2017-04-28 20:24 ` xlog_write: reservation ran out Ming Lin
2017-04-28 20:56 ` Ming Lin
2017-05-01 6:10 ` Ming Lin
2017-05-01 13:12 ` Brian Foster
2017-05-01 17:41 ` Ming Lin
2017-05-01 18:48 ` Brian Foster
2017-05-01 20:18 ` Ming Lin
2017-05-02 23:51 ` Dave Chinner
2017-05-03 14:10 ` Brian Foster
2017-05-03 15:53 ` Darrick J. Wong
2017-05-03 16:15 ` Brian Foster
2017-05-04 0:37 ` Dave Chinner
2017-05-04 0:34 ` Dave Chinner [this message]
2017-05-04 11:51 ` Brian Foster
2017-05-05 3:16 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170504003435.GC17542@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=boqian.zy@alibaba-inc.com \
--cc=ceph-devel@vger.kernel.org \
--cc=hch@lst.de \
--cc=james.liu@alibaba-inc.com \
--cc=linux-xfs@vger.kernel.org \
--cc=mlin@kernel.org \
--cc=sheng.qiu@alibaba-inc.com \
--cc=sweil@redhat.com \
--cc=xiongwei.jiang@alibaba-inc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).