All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linda Walsh <xfs@tlinx.org>
To: stan@hardwarefreak.com
Cc: xfs-oss <xfs@oss.sgi.com>
Subject: Re: does having ~Ncore+1? kworkers flushing XFS to 1 disk improve throughput?
Date: Sat, 24 Aug 2013 16:22:11 -0700	[thread overview]
Message-ID: <52194023.2060702@tlinx.org> (raw)
In-Reply-To: <521904F4.90208@hardwarefreak.com>



Stan Hoeppner wrote:
> On 8/24/2013 12:18 PM, Linda Walsh wrote:
>>
>> Stan Hoeppner wrote:
>>> On 8/23/2013 9:33 PM, Linda Walsh wrote:
>>>
>>>> So what are all the kworkers doing and does having 6 of them do
>>>> things at the same time really help disk-throughput?
>>>>
>>>> Seems like they would conflict w/each other, cause disk
>>>> contention, and extra fragmentation as they do things?  If they
>>>> were all writing to separate disks, that would make sense, but do
>>>> that many kworker threads need to be finishing out disk I/O on
>>>> 1 disk?
>>> https://raw.github.com/torvalds/linux/master/Documentation/workqueue.txt
>> ----
>>
>> Thanks for the pointer.
>>
>> I see ways to limit #workers/cpu if they were hogging too much cpu,
>> which isn't the problem..  My concern is that the work they are
>> doing is all writing info back to the same physical disk -- and that
>> while >1 writer can improve throughput, generally, it would be best
>> if the pending I/O was sorted in disk order and written out using
>> the elevator algorithm.  I.e. I can't imagine that it takes 6-8
>> processes (mostly limiting themselves to 1 NUMA node) to keep the
>> elevator filled?
> 
> You're making a number of incorrect assumptions here.  The work queues
> are generic, which is clearly spelled out in the document above.  The
> kworker threads are just that, kernel threads, not processes as you
> assume above.
----
	Sorry, terminology.  Linux threads are implemented as processes with
minor differences -- they are threads, though as the kernel see them.

>  XFS is not the only subsystem that uses them.  Any
> subsystem or driver can use work queues.  You can't tell what's
> executing within a kworker thread from top or ps output.  You must look
> at the stack trace.
> 
> The work you are seeing in those 7 or 8 kworker threads is not all
> parallel XFS work.  Your block device driver, whether libata, SCSI, or
> proprietary RAID card driver, is placing work in these queues as well.
---
Hmmm.... I hadn't thought of the driver doing that... I sort thought
it just took blocks as fed by the kernel and when it was done with
a DMA, then it told the kernel it was done and was ready for another.

I thought such drivers did direct IO at that point -- i.e. they are below
the elevator algorithm?


> The work queues are not limited to filesystems and block device drivers.
>  Any device driver or kernel subsystem can use work queues.
---
	True, but I when I see a specific number come up and work
constantly when I unpack a tar, I would see it as related to that
command.   What other things would use that much cpu?

> 
> Nothing bypasses the elevator; sectors are still sorted.  But keep in
> mind if you're using a hardware RAID controller -it- does the final
> sorting of writeback anyway, so this is a non issue.
LSI raid

> 
> So in a nutshell, whatever performance issue you're having, if you
> indeed have an issue, isn't caused by work queues or the number of
> kworker threads on your system, per CPU, or otherwise.

Um... but it could be made worse by having an excessive number of
threads all contending for a limited resource.   The more contenders
for a limited resource, the more the scheduler has to sort out who
gets access to the resource next.

If you have 6 threads dumping sectors to different areas of the
disk that need seeks between each thread's output becoming complete,
then you have a seek penalty with each thread switch -- vs. if
they were coalesced and sorted into 1 queue, 1 worker could do
the work of the 6 without the extra seeks between the different
kworkers emptying their queues.


> You need to look
> elsewhere for the bottleneck.  Given it's lightning fast up to the point
> buffers start flushing to disk it's pretty clear your spindles simply
> can't keep up.
----
	That's not the point (though it is a given).  What I'm focusing on
is how the kernel handles a backlog.

	If I want throughput, I use 1 writer -- to an unfragmented file that
won't require seeks.  If I try to use 2 writers -- each to unfrag'd files
and run them at the same time, It's almost certain that that the throughput will
drop == since the disk will have to seek back and forth between the two files
to give "disk-write-resources" to each writer.

	It would be faster if I did both files sequentially rather than trying to
do them in parallel, The disk is limited to ~1GB/s, -- every seek that needs to
be done to get files out reduces that.  So tar splats 5000 files into memory.
Then it takes time for those to be written.   If I write 5000 files sequentially
with 1 writer, I will get faster performance than if I use 25 threads each
dumping 50 files in parallel.  The disk subsystem's responsiveness drops
due to all the seeks between writes, whereas if it was 1 big sorted write --
it could be written out in 1-2 elevator passes... I don't think it is being
that efficient.  Thus my Q about whether or not it was really the optimal way
to improve throughput to have "too many writers" accessing a resource at the
same time.

	I'm not saying there is a "problem" per se, I'm just asking/wondering
how so many writers won't have the disk seeking all over the place to round-robin
service their requests.

	FWIW, the disk could probably handle 2-3 writers and show improvement
over a single -- but anything over that, and I have started to see an overall
drop in throughput.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

      reply	other threads:[~2013-08-24 23:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-24  2:33 does having ~Ncore+1? kworkers flushing XFS to 1 disk improve throughput? Linda Walsh
2013-08-24  4:07 ` Stan Hoeppner
2013-08-24 17:18   ` Linda Walsh
2013-08-24 19:09     ` Stan Hoeppner
2013-08-24 23:22       ` Linda Walsh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52194023.2060702@tlinx.org \
    --to=xfs@tlinx.org \
    --cc=stan@hardwarefreak.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.