All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Tinguely <tinguely@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: bpm@sgi.com, xfs@oss.sgi.com
Subject: Re: [PATCH 0/3] xfs: allocation worker causes freelist buffer lock hang
Date: Thu, 27 Sep 2012 15:10:38 -0500	[thread overview]
Message-ID: <5064B2BE.8060405@sgi.com> (raw)
In-Reply-To: <20120926234149.GJ29154@dastard>

On 09/26/12 18:41, Dave Chinner wrote:
> On Wed, Sep 26, 2012 at 09:14:14AM -0500, Mark Tinguely wrote:
>> On 09/25/12 17:01, Dave Chinner wrote:
>>> On Tue, Sep 25, 2012 at 10:14:16AM -0500, Mark Tinguely wrote:
>>
>> <deletes>
>>
>>>>>>
>>>>>> As a bonus, consolidating the loops into one worker actually gives a slight
>>>>>> performance advantage.
>>>>>
>>>>> Can you quantify it?
>>>>
>>>> I was comparing the bonnie and iozone benchmarks outputs. I will see
>>>> if someone can enlighten me on how to quantify those numbers.
>>>
>>> Ugh.
>>>
>>> Don't bother. Those are two of the worst offenders in the "useless
>>> benchmarks for regression testing" category. Yeah, they *look* like
>>> they give decent numbers, but I've wasted so much time looking at
>>> results from these benhmarks only to find they do basic things wrong
>>> and give numbers that vary simple because you've made a change that
>>> increases or decreases the CPU cache footprint of a code path.
>>>
>>> e.g. IOZone uses the same memory buffer as the source/destination of
>>> all it's IO, and does not touch the contents of it at all. Hence for
>>> small IO, the buffer stays resident in the CPU caches and gives
>>> unrealsitically high throughput results. Worse is the fact that CPU
>>> cache residency of the buffer can change according to the kernel
>>> code path taken, so you can get massive changes in throughput just
>>> by changing the layout of the code without changing any logic....
>>>
>>> IOZone can be useful if you know exactly what you are doing and
>>> using it to test a specific code path with a specific set of
>>> configurations. e.g. comparing ext3/4/xfs/btrfs on the same kernel
>>> and storage is fine. However, the moment you start using it to
>>> compare different kernels, it's a total crap shoot....
>>
>> does anyone have a good benchmark XFS should use to share
>> performance results? A number that we can agree a series does not
>> degrade the filesystem..
>
> ffsb and fio with benchmark style workload definitions, fsmark when
> using total runtime rather than files/s for comparison. I also have a
> bunch of hand written scripts that I've written specifically for
> purpose for testing stuff like directory traversal and IO
> parallelism scalability.
>
> The problem is that common benchmarks like iozone, postmark,
> bonnie++ and dbench are really just load generators that output
> numbers - there are all very sensitive to many non-filesystem
> changes (e.g. VM tunings) and so have to be run "just right" to
> produce meaningful numbers.  That "just right" changes from kernel
> to kernel, machine to machine (running the same kernel!) and
> filesystem to filesystem.
>
> Dbench is particularly sensitive to VM changes and tunings, as well as
> log IO latency.  e.g.  running it on the same system on different
> region of the same disk give significantly different results because
> writes to the log are slower on the inner regions of disk. Indeed,
> just changing the log stripe unit without changing anything else can
> have a marked affect on results...
>
> I have machines here that I can't use IOzone at all for IO sizes of
> less than 4MB because the timing loops it uses don't have enough
> granularity to accurately measure syscall times, and hence it
> doesn't report throughputs reliably. The worst part is that in the
> results output, it doesn't tell you it had problems - youhave to
> look inteh log files to get that information and realise that the
> entire benchmark run was compromised...
>
>> lies, damn lies, statistics and then filesystem benchmarks?! :)
>
> I think I've said that myself many times in the past. ;)

Thank-you for the information.

>
>>> I guess I don't understand what you mean by "loop on
>>> xfs_alloc_vextent()" then.
>>>
>>> The problem I see above is this:
>>>
>>> thread 1		worker 1		worker 2..max
>>> xfs_bmapi_write(userdata)
>>
>>      loops here calling xfs_bmapi_alloc()
>
> I don't think it can for userdata allocations. Metadata, yes,
> userdata, no.
>
>>>    xfs_bmapi_allocate(user)
>>>      xfs_alloc_vextent(user)
>>>        wait
>>>
>>> 			_xfs_alloc_vextent()
>>> 			locks AGF
>>
>> 			first loop it takes the lock
>
>> 			one of the next times through the above
>> 			 loop it cannot get a worker. deadlock here.
>
>>
>> 			I saved the xfs_bmalloca and fs_alloc_arg when
>> 			 allocating a buffer to verify the paths.
>>
>>> 						_xfs_alloc_vextent()
>>> 						blocks on AGF lock
>>
>>> 			completes allocation
>>>
>>>        <returns with AGF locked in transaction>
>
> And then xfs_bmapi_allocate() increments bma->nallocs after each
> successful allocation. Hence the only way we can loop there for
> userdata allocations is if the nmap parameter passes in is>  1.
>
> When it returns to xfs-bmapi_write(), it trims and updates the map,
> and then this code prevents it from looping:
>
>                  /*
>                   * If we're done, stop now.  Stop when we've allocated
>                   * XFS_BMAP_MAX_NMAP extents no matter what.  Otherwise
>                   * the transaction may get too big.
>                   */
>                  if (bno>= end || n>= *nmap || bma.nallocs>= *nmap)
>                          break;
>
> Because the userdata callers use these values for *nmap:
>
> 	xfs_iomap_write_direct		nimaps = 1;
> 	xfs_iomap_write_allocate	nimaps = 1;
> 	xfs_iomap_write_unwritten	nimaps = 1;
> 	xfs_alloc_file_space		nimaps = 1;
>
> So basically there will break out of the loop after a single
> allocation, and hence should not be getting stuck on a second
> allocation in xfs_bmapi_allocate()/xfs_alloc_vextent() with the AGF
> locked from this code path.
>
>
>>>      xfs_bmap_add_extent_hole_real
>>>        xfs_bmap_extents_to_btree
>>>          xfs_alloc_vextent(user)
>>>            wait
>>
>> 	this does not need a worker, and since in the same
>
> I agree that it does not *need* a worker, but it will *use* a worker
> thread if args->userdata is not initialised to zero.
>
>>             transaction all locks to the AGF buffer are recursive locks.
>
> Right, the worker won't block on the AGF lock, but if it can't get a
> worker because they are all blocked on the AGF lock, then it
> deadlocks.
>
> As I said before, I cannot see any other path that will trigger a
> userdata allocation with the AGF locked. So does patch 3 by itself
> solve the problem?
>


I thought I tried it, but I will retry it when I have a free machine.


--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-09-27 20:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-19 16:31 [PATCH 0/3] xfs: allocation worker causes freelist buffer lock hang tinguely
2012-09-19 16:31 ` [PATCH 1/3] xfs: restrict allocate worker to x86_64 tinguely
2012-09-19 21:54   ` Dave Chinner
2012-09-20 17:37     ` Mark Tinguely
2012-09-24 17:37       ` Ben Myers
2012-09-25  0:14         ` Dave Chinner
2012-09-19 16:31 ` [PATCH 2/3] xfs: move allocate worker tinguely
2012-09-19 16:31 ` [PATCH 3/3] xfs: zero allocation_args on the kernel stack tinguely
2012-09-19 23:41   ` Dave Chinner
2012-09-20 18:16   ` [PATCH 3/3 v2] " Mark Tinguely
2012-09-25 20:20     ` Ben Myers
2012-10-18 22:52     ` Ben Myers
2012-09-19 23:34 ` [PATCH 0/3] xfs: allocation worker causes freelist buffer lock hang Dave Chinner
2012-09-20 13:49   ` Mark Tinguely
2012-09-24 13:25   ` Dave Chinner
2012-09-24 17:11 ` Ben Myers
2012-09-24 18:09   ` Mark Tinguely
2012-09-25  0:56     ` Dave Chinner
2012-09-25 15:14       ` Mark Tinguely
2012-09-25 22:01         ` Dave Chinner
2012-09-26 14:14           ` Mark Tinguely
2012-09-26 23:41             ` Dave Chinner
2012-09-27 20:10               ` Mark Tinguely [this message]
2012-09-28  3:08         ` Dave Chinner
2012-10-01 22:10           ` [PATCH 0/3] xfs: allocation worker causes freelist buffer lock Mark Tinguely
2012-10-01 23:10             ` Dave Chinner
2012-09-27 22:48     ` [PATCH 0/3] xfs: allocation worker causes freelist buffer lock hang Ben Myers
2012-09-27 23:17       ` Mark Tinguely
2012-09-27 23:27         ` Mark Tinguely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5064B2BE.8060405@sgi.com \
    --to=tinguely@sgi.com \
    --cc=bpm@sgi.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.