public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Marc Lehmann <schmorp@schmorp.de>
To: Michael Monnerie <michael.monnerie@is.it-management.at>
Cc: xfs@oss.sgi.com
Subject: Re: frequent kernel BUG and lockups - 2.6.39 + xfs_fsr
Date: Tue, 9 Aug 2011 13:15:27 +0200	[thread overview]
Message-ID: <20110809111526.GA7631@schmorp.de> (raw)
In-Reply-To: <201108091210.50204@zmi.at>

On Tue, Aug 09, 2011 at 12:10:48PM +0200, Michael Monnerie <michael.monnerie@is.it-management.at> wrote:
> First of all, please calm down. Getting personal is not bringing us 
> anywhere.

Well, it's not me who's getting personal, so...?

> > Logic error - if I can corrupt an XFS without special privileges then
> > this is not a problem with xfs_fsr, but simply a kernel bug in the
> > xfs code. And a rather big one, one step below a remote exploit.
> 
> No, it's not a kernel bug because as long as you don't use xfs_fsr, 
> nothing will ever happen.

"As long as you don't boot, it will not crash".

xfs_fsr uses syscalls, just like other applications. According to your
(wrong) logic, if an application uses chown and this causes a kernel oops,
this is also not a kernel bug.

Thats of course wrong - it's the kernel that crashes when an applicaiton
does certain access patterns.

> (rw,nodiratime,relatime,logbufs=8,logbsize=256k,attr2,barrier,largeio,swalloc)
> and sometimes also 
> ,allocsize=64m

As has been reported on this list, this option is really harmful on
current xfs - in my case, it lead to xfs causing ENOSPC even when the disk
was 40% empty (~188gb).

> and I can't find evidence for fragmentation that would be harmful.Yes 

Well, define "harmful" - slow logfile reads aren't what I consider
"harmful" either. It's just very very slow.

> The allocsize option helps a lot there. I looked at one webserver access 
> log, it has 640MB with 99 fragments, but that's not a lot. On our 
> Spamgate I see 250MB logs with 374 fragments.

Well, if it were one fragment, you could read that in 4-5 seconds, at 374
fragments, it's probably around 6-7 seconds. Thats not harmful, but if you
extrapolate this to a few gigabytes and a lot of files, it becomes quite
the overhead.

> don't use the allocsize option there, which I changed now that I looked 

That allocsize option is no longer reasonable with newer kernels, as the
kernel will reserve 64m diskspace even for 1kb files indefinitely.

> > If XFS is bad at append-only workloads, which is the most common type
> > of workload, then XFS fails to be very relevant for the real world.
> 
> may be valid for your world, not mine. We have webservers, fileservers 
> and database servers, all of which are not really append style, but more 
> delete-and-recreate.

If you find a way of recreating files without appending to them, let me
know.

The problem with fragmentatioon is that it happens even for a few writers
for "create file" workloads (which do append...).

You probably make a distinction between "writing a file fast" and "writing
a file slow", but the distinction is not a qualitative difference. On busy
servers thta create a lot of files, you get fragmentation the same way
as on less busy servers that write files slower. There is little to no
difference in the resulting patterns.

> Well, db-servers are rather exceptional here.

Yes, append style is what makes up for the vast majority of disk writes on
a normal system, db-servers excepted indeed.

> But if the numbers for fragmentation on your servers are true, you must 
> have a very good test case for fragmentation prevention. Therefore it 
> could be really interesting if you could grab what Dave Chinner asked 
> for:

I'll keep it in mind.

> And maybe he could use it for optimizations. Is there any tool on Linux 
> to record such I/O patterns?

I presume strace would do, but thats where the "lot of work" comes in. If
there is a ready-to-use tool, that would of course make it easy.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-08-09 11:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-06 12:25 frequent kernel BUG and lockups - 2.6.39 + xfs_fsr Marc Lehmann
2011-08-06 14:20 ` Dave Chinner
2011-08-07  1:42   ` Marc Lehmann
2011-08-07 10:26     ` Dave Chinner
2011-08-08 19:02       ` Marc Lehmann
2011-08-09 10:10         ` Michael Monnerie
2011-08-09 11:15           ` Marc Lehmann [this message]
2011-08-10  6:59             ` Michael Monnerie
2011-08-11 22:04               ` Marc Lehmann
2011-08-12  4:05                 ` Dave Chinner
2011-08-26  8:08                   ` Marc Lehmann
2011-08-31 12:45                     ` Dave Chinner
2011-08-10 14:16             ` Dave Chinner
2011-08-11 22:07               ` Marc Lehmann
2011-08-09  9:16       ` Marc Lehmann
2011-08-09 11:35         ` Dave Chinner
2011-08-09 16:35           ` Marc Lehmann
2011-08-09 22:31             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110809111526.GA7631@schmorp.de \
    --to=schmorp@schmorp.de \
    --cc=michael.monnerie@is.it-management.at \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox