Re: Raw devices broken in 2.6.1? AND- 2.6.1 I/O degraded?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Curt" <curt@northarc.com>
To: "Andrew Morton" <akpm@osdl.org>
Cc: <linux-kernel@vger.kernel.org>
Subject: Re: Raw devices broken in 2.6.1? AND- 2.6.1 I/O degraded?
Date: Fri, 30 Jan 2004 01:34:53 -0500	[thread overview]
Message-ID: <015301c3e6fb$2dbc22b0$0300000a@falcon> (raw)
In-Reply-To: 20040129205605.5bd140b2.akpm@osdl.org

> > I finnaly narrowed it down to the disk subsystem, and the test program
shows
> > the meat of it. When there is massive contention for a file, or just
heavy
> > (VERY heavy) volume, the 2.6.1 kernel (presumably the filesystem
portion)
> > falls over dead. The test program doesn't show death, but could by just
> > upping the thrasher count a bit.
>
> 2.6.1 had a readahead bug which will adversely affect workloads such as
> this.  Apart from that, there is still quite a lot of tuning work to be
> done in 2.6.  As the tree settles down, and as more testers come on board.
> People are working on the VM and readahead code as we speak.  It's always
> best to test the most up-to-date tree.

Will do.

> > I have been telling our customers for over a year now "Don't
> > worry, Linux will be able to rock with the new threading model" and
then..
>
> You wouldn't expect huge gain from 2.6 in the I/O department.  Your
> workload is seek-limited.  I am seeing some small benefits from the
> anticipatory I/O scheduler with your test though.

Nice to be amongst people who know that, usually I'm the one beating my head
against customers trying to explain to them how even the fanciest
super-drives will be brought to their knees by our random-access patterns.
My kingdom for a quantum leap in seek penalties.

The gains we (Highwinds-Software) were expecting to see in 2.6.x were in the
thread scheduling, as a massively-multithreaded database application (1000
threads without breaking a sweat) we were very anxious for true kernel-space
LWPs in Linux. Our I/O world is well outside the CPU in RAID stacks, not a
whole lot any kernel can do to help us there, 99% of the time its driver
issues.

> I'm fairly suspicious about the disparity between the time taken for those
> initial large writes.  Two possible reasons come to mind:
>
> 1) Your disk isn't using DMA.  Use `hdparm' to check it, and check your
>    kernel IDE config if it is not using DMA.

To my great surprise DMA was NOT enabled, even though support for it was
(apparantly) compiled into the kernel. Thats a puzzle for another day. On
different hardware the problem did seem to go away.

> 2) 2.4 sets the dirty memory writeback thresholds much higher: 40%/60%
>    vs 10%/40%.  So on a 512M box it is possible that there is much more
>    dirty, unwritten-back memory after the timing period has completed than
>    under 2.6.  Although this difference in tuning can affect real-world
>    workloads, it is really an error in the testing methodology.
Generally,
>    the timing shuld include an fsync() so that all I/O which the program
>    issue has completed.

Yeah I know, but it was late, I just waited for the drive light to turn off
before I ran it the second time ;)

>    Or you can put 2.6 on par by setting
>    /proc/sys/vm/dirty_background_ratio to 40 and dirty_ratio to 60.

Okay will do, is there a good comprehensive resource where I can read up on
these (and presumably many other I/O related) variables?

> Longer-term, if your customers are using scsi, you should ensure that the
> disks do not use a tag queue depth of more than 4 or 8.  More than that
and
> the anticipatory scheduler becomes ineffective and you won't get that
> multithreaded-read goodness.

I've heard-tell of tweaking the elevator paramter to 'deadline', again could
you point me to a resource where I can read up on this? And forgive the
newbie-question, but is this a boot-time parameter, or a bit I can set in
the /proc system, or both?

> Please stay in touch, btw.  If we cannot get applications such as yours
> working well, we've wasted our time...

I'll do what I can to provide real-world feedback, I want this to work too.

-Curt

next prev parent reply	other threads:[~2004-01-30  6:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-01-29 22:44 Raw devices broken in 2.6.1? Curt Hartung
2004-01-30  0:38 ` Andrew Morton
2004-01-30  1:30   ` Raw devices broken in 2.6.1? AND- 2.6.1 I/O degraded? Curt Hartung
2004-01-30  4:56     ` Andrew Morton
2004-01-30  6:34       ` Curt [this message]
2004-01-30  6:46         ` Andrew Morton
2004-01-30  7:13           ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='015301c3e6fb$2dbc22b0$0300000a@falcon' \
    --to=curt@northarc.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox