From: Mark Tinguely <tinguely@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: bpm@sgi.com, xfs@oss.sgi.com
Subject: Re: [PATCH 0/3] xfs: allocation worker causes freelist buffer lock hang
Date: Thu, 27 Sep 2012 15:10:38 -0500 [thread overview]
Message-ID: <5064B2BE.8060405@sgi.com> (raw)
In-Reply-To: <20120926234149.GJ29154@dastard>
On 09/26/12 18:41, Dave Chinner wrote:
> On Wed, Sep 26, 2012 at 09:14:14AM -0500, Mark Tinguely wrote:
>> On 09/25/12 17:01, Dave Chinner wrote:
>>> On Tue, Sep 25, 2012 at 10:14:16AM -0500, Mark Tinguely wrote:
>>
>> <deletes>
>>
>>>>>>
>>>>>> As a bonus, consolidating the loops into one worker actually gives a slight
>>>>>> performance advantage.
>>>>>
>>>>> Can you quantify it?
>>>>
>>>> I was comparing the bonnie and iozone benchmarks outputs. I will see
>>>> if someone can enlighten me on how to quantify those numbers.
>>>
>>> Ugh.
>>>
>>> Don't bother. Those are two of the worst offenders in the "useless
>>> benchmarks for regression testing" category. Yeah, they *look* like
>>> they give decent numbers, but I've wasted so much time looking at
>>> results from these benhmarks only to find they do basic things wrong
>>> and give numbers that vary simple because you've made a change that
>>> increases or decreases the CPU cache footprint of a code path.
>>>
>>> e.g. IOZone uses the same memory buffer as the source/destination of
>>> all it's IO, and does not touch the contents of it at all. Hence for
>>> small IO, the buffer stays resident in the CPU caches and gives
>>> unrealsitically high throughput results. Worse is the fact that CPU
>>> cache residency of the buffer can change according to the kernel
>>> code path taken, so you can get massive changes in throughput just
>>> by changing the layout of the code without changing any logic....
>>>
>>> IOZone can be useful if you know exactly what you are doing and
>>> using it to test a specific code path with a specific set of
>>> configurations. e.g. comparing ext3/4/xfs/btrfs on the same kernel
>>> and storage is fine. However, the moment you start using it to
>>> compare different kernels, it's a total crap shoot....
>>
>> does anyone have a good benchmark XFS should use to share
>> performance results? A number that we can agree a series does not
>> degrade the filesystem..
>
> ffsb and fio with benchmark style workload definitions, fsmark when
> using total runtime rather than files/s for comparison. I also have a
> bunch of hand written scripts that I've written specifically for
> purpose for testing stuff like directory traversal and IO
> parallelism scalability.
>
> The problem is that common benchmarks like iozone, postmark,
> bonnie++ and dbench are really just load generators that output
> numbers - there are all very sensitive to many non-filesystem
> changes (e.g. VM tunings) and so have to be run "just right" to
> produce meaningful numbers. That "just right" changes from kernel
> to kernel, machine to machine (running the same kernel!) and
> filesystem to filesystem.
>
> Dbench is particularly sensitive to VM changes and tunings, as well as
> log IO latency. e.g. running it on the same system on different
> region of the same disk give significantly different results because
> writes to the log are slower on the inner regions of disk. Indeed,
> just changing the log stripe unit without changing anything else can
> have a marked affect on results...
>
> I have machines here that I can't use IOzone at all for IO sizes of
> less than 4MB because the timing loops it uses don't have enough
> granularity to accurately measure syscall times, and hence it
> doesn't report throughputs reliably. The worst part is that in the
> results output, it doesn't tell you it had problems - youhave to
> look inteh log files to get that information and realise that the
> entire benchmark run was compromised...
>
>> lies, damn lies, statistics and then filesystem benchmarks?! :)
>
> I think I've said that myself many times in the past. ;)
Thank-you for the information.
>
>>> I guess I don't understand what you mean by "loop on
>>> xfs_alloc_vextent()" then.
>>>
>>> The problem I see above is this:
>>>
>>> thread 1 worker 1 worker 2..max
>>> xfs_bmapi_write(userdata)
>>
>> loops here calling xfs_bmapi_alloc()
>
> I don't think it can for userdata allocations. Metadata, yes,
> userdata, no.
>
>>> xfs_bmapi_allocate(user)
>>> xfs_alloc_vextent(user)
>>> wait
>>>
>>> _xfs_alloc_vextent()
>>> locks AGF
>>
>> first loop it takes the lock
>
>> one of the next times through the above
>> loop it cannot get a worker. deadlock here.
>
>>
>> I saved the xfs_bmalloca and fs_alloc_arg when
>> allocating a buffer to verify the paths.
>>
>>> _xfs_alloc_vextent()
>>> blocks on AGF lock
>>
>>> completes allocation
>>>
>>> <returns with AGF locked in transaction>
>
> And then xfs_bmapi_allocate() increments bma->nallocs after each
> successful allocation. Hence the only way we can loop there for
> userdata allocations is if the nmap parameter passes in is> 1.
>
> When it returns to xfs-bmapi_write(), it trims and updates the map,
> and then this code prevents it from looping:
>
> /*
> * If we're done, stop now. Stop when we've allocated
> * XFS_BMAP_MAX_NMAP extents no matter what. Otherwise
> * the transaction may get too big.
> */
> if (bno>= end || n>= *nmap || bma.nallocs>= *nmap)
> break;
>
> Because the userdata callers use these values for *nmap:
>
> xfs_iomap_write_direct nimaps = 1;
> xfs_iomap_write_allocate nimaps = 1;
> xfs_iomap_write_unwritten nimaps = 1;
> xfs_alloc_file_space nimaps = 1;
>
> So basically there will break out of the loop after a single
> allocation, and hence should not be getting stuck on a second
> allocation in xfs_bmapi_allocate()/xfs_alloc_vextent() with the AGF
> locked from this code path.
>
>
>>> xfs_bmap_add_extent_hole_real
>>> xfs_bmap_extents_to_btree
>>> xfs_alloc_vextent(user)
>>> wait
>>
>> this does not need a worker, and since in the same
>
> I agree that it does not *need* a worker, but it will *use* a worker
> thread if args->userdata is not initialised to zero.
>
>> transaction all locks to the AGF buffer are recursive locks.
>
> Right, the worker won't block on the AGF lock, but if it can't get a
> worker because they are all blocked on the AGF lock, then it
> deadlocks.
>
> As I said before, I cannot see any other path that will trigger a
> userdata allocation with the AGF locked. So does patch 3 by itself
> solve the problem?
>
I thought I tried it, but I will retry it when I have a free machine.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-09-27 20:09 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-19 16:31 [PATCH 0/3] xfs: allocation worker causes freelist buffer lock hang tinguely
2012-09-19 16:31 ` [PATCH 1/3] xfs: restrict allocate worker to x86_64 tinguely
2012-09-19 21:54 ` Dave Chinner
2012-09-20 17:37 ` Mark Tinguely
2012-09-24 17:37 ` Ben Myers
2012-09-25 0:14 ` Dave Chinner
2012-09-19 16:31 ` [PATCH 2/3] xfs: move allocate worker tinguely
2012-09-19 16:31 ` [PATCH 3/3] xfs: zero allocation_args on the kernel stack tinguely
2012-09-19 23:41 ` Dave Chinner
2012-09-20 18:16 ` [PATCH 3/3 v2] " Mark Tinguely
2012-09-25 20:20 ` Ben Myers
2012-10-18 22:52 ` Ben Myers
2012-09-19 23:34 ` [PATCH 0/3] xfs: allocation worker causes freelist buffer lock hang Dave Chinner
2012-09-20 13:49 ` Mark Tinguely
2012-09-24 13:25 ` Dave Chinner
2012-09-24 17:11 ` Ben Myers
2012-09-24 18:09 ` Mark Tinguely
2012-09-25 0:56 ` Dave Chinner
2012-09-25 15:14 ` Mark Tinguely
2012-09-25 22:01 ` Dave Chinner
2012-09-26 14:14 ` Mark Tinguely
2012-09-26 23:41 ` Dave Chinner
2012-09-27 20:10 ` Mark Tinguely [this message]
2012-09-28 3:08 ` Dave Chinner
2012-10-01 22:10 ` [PATCH 0/3] xfs: allocation worker causes freelist buffer lock Mark Tinguely
2012-10-01 23:10 ` Dave Chinner
2012-09-27 22:48 ` [PATCH 0/3] xfs: allocation worker causes freelist buffer lock hang Ben Myers
2012-09-27 23:17 ` Mark Tinguely
2012-09-27 23:27 ` Mark Tinguely
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5064B2BE.8060405@sgi.com \
--to=tinguely@sgi.com \
--cc=bpm@sgi.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox