* Delaylog information enquiry
@ 2014-07-29 8:53 Frank .
2014-07-29 12:38 ` Brian Foster
0 siblings, 1 reply; 9+ messages in thread
From: Frank . @ 2014-07-29 8:53 UTC (permalink / raw)
To: xfs@oss.sgi.com
[-- Attachment #1.1: Type: text/plain, Size: 1130 bytes --]
Hello.
I just wanted to have more information about the delaylog feature.
>From what I understood it seems to be a common feature from different FS. It's supposed to retain information such as metadata for a time ( how much ?). Unfortunately, I could not find further information about journaling log section in the XFS official documentation.
I just figured out that delaylog feature is now included and there is no way to disable it (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xfs.txt?id=HEAD).
Whatever the information it could be, I understood that this is a temporary memory located in RAM.
Recently, I had a crash on a server and I had to execute the repair procedure which worked fine.
But I would like to disable this feature to prevent any temporary data not to be written do disk. (Write cache is already disabled on both hard drive and raid controller).
Perhaps it's a bad idea disabling it. If so, I would like to have your opinion about where memory corruption could happen.
Any help would be much appreciated.
Thank you.
[-- Attachment #1.2: Type: text/html, Size: 1378 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Delaylog information enquiry
2014-07-29 8:53 Delaylog information enquiry Frank .
@ 2014-07-29 12:38 ` Brian Foster
2014-07-29 23:41 ` Dave Chinner
0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2014-07-29 12:38 UTC (permalink / raw)
To: Frank .; +Cc: xfs@oss.sgi.com
On Tue, Jul 29, 2014 at 10:53:09AM +0200, Frank . wrote:
> Hello.
>
> I just wanted to have more information about the delaylog feature.
> From what I understood it seems to be a common feature from different FS. It's supposed to retain information such as metadata for a time ( how much ?). Unfortunately, I could not find further information about journaling log section in the XFS official documentation.
> I just figured out that delaylog feature is now included and there is no way to disable it (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xfs.txt?id=HEAD).
>
There is a design document for XFS delayed logging co-located with the
xfs doc:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xfs-delayed-logging-design.txt?id=HEAD
I'm not an expert on the delayed logging infrastructure so I can't give
details, but it's basically a change to aggregate logged items into a
list (committed item list - CIL) and "local" areas of memory (log
vectors) at transaction commit time rather than logging directly into
the log buffers. The benefits and tradeoffs of this are described in the
link above. One tradeoff is that more items can be aggregated before a
checkpoint occurs, so that naturally means more items are batched in
memory and written to the log at a time.
This in turn means that in the event of a crash, more logged items are
lost than the older, less efficient implementation. This doesn't effect
the consistency of the fs, which is the purpose of the log.
> Whatever the information it could be, I understood that this is a temporary memory located in RAM.
> Recently, I had a crash on a server and I had to execute the repair procedure which worked fine.
>
A crash should typically only require a log replay and that happens
automatically on the next mount. If you experience otherwise, it's a
good idea to report that to the list with the data listed here:
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> But I would like to disable this feature to prevent any temporary data not to be written do disk. (Write cache is already disabled on both hard drive and raid controller).
>
> Perhaps it's a bad idea disabling it. If so, I would like to have your opinion about where memory corruption could happen.
>
Delayed logging is not configurable these days. The original
implementation was optional via a mount option, but my understanding is
that might have been more of a precaution for a new feature than a real
tuning option.
If you want to ensure consistency of certain operations, those
applications should issue fsync() calls as appropriate. You could also
look into the 'wsync' mount option (and probably expect a significant
performance hit).
Brian
> Any help would be much appreciated.
> Thank you.
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Delaylog information enquiry
2014-07-29 12:38 ` Brian Foster
@ 2014-07-29 23:41 ` Dave Chinner
2014-07-30 5:42 ` Grozdan
0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2014-07-29 23:41 UTC (permalink / raw)
To: Brian Foster; +Cc: Frank ., xfs@oss.sgi.com
On Tue, Jul 29, 2014 at 08:38:16AM -0400, Brian Foster wrote:
> On Tue, Jul 29, 2014 at 10:53:09AM +0200, Frank . wrote:
> > Hello.
> >
> > I just wanted to have more information about the delaylog feature.
> > From what I understood it seems to be a common feature from different FS. It's supposed to retain information such as metadata for a time ( how much ?). Unfortunately, I could not find further information about journaling log section in the XFS official documentation.
> > I just figured out that delaylog feature is now included and there is no way to disable it (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xfs.txt?id=HEAD).
> >
>
> There is a design document for XFS delayed logging co-located with the
> xfs doc:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xfs-delayed-logging-design.txt?id=HEAD
Or, indeed, here:
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-documentation.git;a=blob;f=design/xfs-delayed-logging-design.asciidoc
> I'm not an expert on the delayed logging infrastructure so I can't give
> details, but it's basically a change to aggregate logged items into a
> list (committed item list - CIL) and "local" areas of memory (log
> vectors) at transaction commit time rather than logging directly into
> the log buffers. The benefits and tradeoffs of this are described in the
> link above. One tradeoff is that more items can be aggregated before a
> checkpoint occurs, so that naturally means more items are batched in
> memory and written to the log at a time.
>
> This in turn means that in the event of a crash, more logged items are
> lost than the older, less efficient implementation. This doesn't effect
> the consistency of the fs, which is the purpose of the log.
In a nutshell.
Basically, logging in XFS is asynchronous unless directed by the
user application, specific operational constraints or mount options
to be synchronous.
> > Whatever the information it could be, I understood that this is a temporary memory located in RAM.
> > Recently, I had a crash on a server and I had to execute the repair procedure which worked fine.
> >
>
> A crash should typically only require a log replay and that happens
> automatically on the next mount. If you experience otherwise, it's a
> good idea to report that to the list with the data listed here:
>
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> > But I would like to disable this feature to prevent any temporary data not to be written do disk. (Write cache is already disabled on both hard drive and raid controller).
> >
> > Perhaps it's a bad idea disabling it. If so, I would like to have your opinion about where memory corruption could happen.
> >
>
> Delayed logging is not configurable these days. The original
> implementation was optional via a mount option, but my understanding is
> that might have been more of a precaution for a new feature than a real
> tuning option.
>
> If you want to ensure consistency of certain operations, those
> applications should issue fsync() calls as appropriate. You could also
> look into the 'wsync' mount option (and probably expect a significant
> performance hit).
Using the 'wsync' or 'dirsync' mount options effectively cause the
majority of transactions to be synchronous - it always has, even
before delayed logging was implemented - so that once a user visible
namespace operation completes, it is guaranteed to be on stable
storage. This is necessary for HA environments so that failover from
one server to another doesn't result in files appearing or
disappearing on failover...
Note that this does not change file data behaviour. In this case you
need to add the "sync" mount option, which forces all buffered IO to
be synchronous and so will be *very slow*. But if you've already
turned off the BBWC on the RAID controller then your storage is
already terribly slow and so you probably won't care about making
performance even worse...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Delaylog information enquiry
2014-07-29 23:41 ` Dave Chinner
@ 2014-07-30 5:42 ` Grozdan
2014-07-30 8:18 ` Dave Chinner
0 siblings, 1 reply; 9+ messages in thread
From: Grozdan @ 2014-07-30 5:42 UTC (permalink / raw)
To: Dave Chinner; +Cc: Brian Foster, Frank ., xfs@oss.sgi.com
On Wed, Jul 30, 2014 at 1:41 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, Jul 29, 2014 at 08:38:16AM -0400, Brian Foster wrote:
>> On Tue, Jul 29, 2014 at 10:53:09AM +0200, Frank . wrote:
>> > Hello.
>> >
>> > I just wanted to have more information about the delaylog feature.
>> > From what I understood it seems to be a common feature from different FS. It's supposed to retain information such as metadata for a time ( how much ?). Unfortunately, I could not find further information about journaling log section in the XFS official documentation.
>> > I just figured out that delaylog feature is now included and there is no way to disable it (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xfs.txt?id=HEAD).
>> >
>>
>> There is a design document for XFS delayed logging co-located with the
>> xfs doc:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xfs-delayed-logging-design.txt?id=HEAD
>
> Or, indeed, here:
>
> http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-documentation.git;a=blob;f=design/xfs-delayed-logging-design.asciidoc
>
>> I'm not an expert on the delayed logging infrastructure so I can't give
>> details, but it's basically a change to aggregate logged items into a
>> list (committed item list - CIL) and "local" areas of memory (log
>> vectors) at transaction commit time rather than logging directly into
>> the log buffers. The benefits and tradeoffs of this are described in the
>> link above. One tradeoff is that more items can be aggregated before a
>> checkpoint occurs, so that naturally means more items are batched in
>> memory and written to the log at a time.
>>
>> This in turn means that in the event of a crash, more logged items are
>> lost than the older, less efficient implementation. This doesn't effect
>> the consistency of the fs, which is the purpose of the log.
>
> In a nutshell.
>
> Basically, logging in XFS is asynchronous unless directed by the
> user application, specific operational constraints or mount options
> to be synchronous.
>
>> > Whatever the information it could be, I understood that this is a temporary memory located in RAM.
>> > Recently, I had a crash on a server and I had to execute the repair procedure which worked fine.
>> >
>>
>> A crash should typically only require a log replay and that happens
>> automatically on the next mount. If you experience otherwise, it's a
>> good idea to report that to the list with the data listed here:
>>
>> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>>
>> > But I would like to disable this feature to prevent any temporary data not to be written do disk. (Write cache is already disabled on both hard drive and raid controller).
>> >
>> > Perhaps it's a bad idea disabling it. If so, I would like to have your opinion about where memory corruption could happen.
>> >
>>
>> Delayed logging is not configurable these days. The original
>> implementation was optional via a mount option, but my understanding is
>> that might have been more of a precaution for a new feature than a real
>> tuning option.
>>
>> If you want to ensure consistency of certain operations, those
>> applications should issue fsync() calls as appropriate. You could also
>> look into the 'wsync' mount option (and probably expect a significant
>> performance hit).
>
> Using the 'wsync' or 'dirsync' mount options effectively cause the
> majority of transactions to be synchronous - it always has, even
> before delayed logging was implemented - so that once a user visible
> namespace operation completes, it is guaranteed to be on stable
> storage. This is necessary for HA environments so that failover from
> one server to another doesn't result in files appearing or
> disappearing on failover...
>
> Note that this does not change file data behaviour. In this case you
> need to add the "sync" mount option, which forces all buffered IO to
> be synchronous and so will be *very slow*. But if you've already
> turned off the BBWC on the RAID controller then your storage is
> already terribly slow and so you probably won't care about making
> performance even worse...
Dave, excuse my ignorant questions
I know the Linux kernel keeps data in cache up to 30 seconds before a
kernel daemon flushes it to disk, unless
the configured dirty ratio (which is 40% of RAM, iirc) is reached
before these 30 seconds so the flush is done before it
What I did is lower these 30 seconds to 5 seconds so every 5 seconds
data is flushed to disk (I've set the dirty_expire_centisecs to 500).
So, are there any drawbacks in doing this? I mean, I don't care *that*
much for performance but I do want my dirty data to be on
storage in a reasonable amount of time. I looked at the various sync
mount options but they all are synchronous so it is my
impression they'll be slower than giving the kernel 5 seconds to keep
data and then flush it.
>From XFS perspective, I'd like to know if this is not recommended or
if it is? I know that with setting the above to 500 centisecs
means that there will be more writes to disk and potentially may
result in tear & wear, thus shortening the lifetime of the
storage
This is a regular desktop system with a single Seagate Constellation
SATA disk so no RAID, LVM, thin provision or anything else
What do you think? :)
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
--
Yours truly
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Delaylog information enquiry
2014-07-30 5:42 ` Grozdan
@ 2014-07-30 8:18 ` Dave Chinner
2014-07-30 11:44 ` Frank .
2014-07-30 21:18 ` Grozdan
0 siblings, 2 replies; 9+ messages in thread
From: Dave Chinner @ 2014-07-30 8:18 UTC (permalink / raw)
To: Grozdan; +Cc: Brian Foster, Frank ., xfs@oss.sgi.com
On Wed, Jul 30, 2014 at 07:42:32AM +0200, Grozdan wrote:
> On Wed, Jul 30, 2014 at 1:41 AM, Dave Chinner <david@fromorbit.com> wrote:
> > Note that this does not change file data behaviour. In this case you
> > need to add the "sync" mount option, which forces all buffered IO to
> > be synchronous and so will be *very slow*. But if you've already
> > turned off the BBWC on the RAID controller then your storage is
> > already terribly slow and so you probably won't care about making
> > performance even worse...
>
> Dave, excuse my ignorant questions
>
> I know the Linux kernel keeps data in cache up to 30 seconds before a
> kernel daemon flushes it to disk, unless
> the configured dirty ratio (which is 40% of RAM, iirc) is reached
10% of RAM, actually.
> before these 30 seconds so the flush is done before it
>
> What I did is lower these 30 seconds to 5 seconds so every 5 seconds
> data is flushed to disk (I've set the dirty_expire_centisecs to 500).
> So, are there any drawbacks in doing this?
Depends on your workload. For a desktop, you probably won't notice
anything different. For a machine that creates lots of temporary
files and then removes them (e.g. build machines) then it could
crater performance completely because it causes writeback before the
files are removed...
> I mean, I don't care *that*
> much for performance but I do want my dirty data to be on
> storage in a reasonable amount of time. I looked at the various sync
> mount options but they all are synchronous so it is my
> impression they'll be slower than giving the kernel 5 seconds to keep
> data and then flush it.
>
> From XFS perspective, I'd like to know if this is not recommended or
> if it is? I know that with setting the above to 500 centisecs
> means that there will be more writes to disk and potentially may
> result in tear & wear, thus shortening the lifetime of the
> storage
>
> This is a regular desktop system with a single Seagate Constellation
> SATA disk so no RAID, LVM, thin provision or anything else
>
> What do you think? :)
I don't think it really matters either way. I don't change
the writeback time on my workstations, build machines or test
machines, but I actually *increase* it on my laptops to save power
by not writing to disk as often. So if you want a little more
safety, then reducing the writeback timeout shouldn't have any
significant affect on performance or wear unless you are doing
something unusual....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Delaylog information enquiry
2014-07-30 8:18 ` Dave Chinner
@ 2014-07-30 11:44 ` Frank .
2014-07-30 22:53 ` Dave Chinner
2014-07-30 21:18 ` Grozdan
1 sibling, 1 reply; 9+ messages in thread
From: Frank . @ 2014-07-30 11:44 UTC (permalink / raw)
To: Dave Chinner; +Cc: neutrino8@gmail.com, xfs@oss.sgi.com
[-- Attachment #1.1: Type: text/plain, Size: 4087 bytes --]
Indeed, I turned sync and wsync flags on. As excpected, I had terribly low performance (1MB/s for write operations). So I decided to turn them back off. (I got my 100 MB/s write throughput back).
I just wanted to reduce as much as possible unnecessary cache between my VM's and my physcal hard drives knowing that there are up to 8 write cache levels.
I'm getting off the subject a bit but here is the list. This is only my conclusion. I don't know if I'm right.
- Guest page cache.
- Virtual disk drive write cache. (off KVM cache=directsync)
- Host page cache. (off KVM cache=directsync)
- GlusterFS cache. (off)
- NAS page cache. (?)
- XFS cache (filesystem).
- RAID controller write cache. (off)
- Physical hard drive write cache. (off)
The main difficulty is that I have to gather information from different sources (editors / constructors) to get an overview of the cache mechanisms. I need to make sure our databases will not crash to any failure of one of those layers.
If you have any suggestions on where to find information or who to ask I would be rather grateful.
But at least I had answers about the XFS part.
Thank you very much !
> Date: Wed, 30 Jul 2014 18:18:58 +1000
> From: david@fromorbit.com
> To: neutrino8@gmail.com
> CC: bfoster@redhat.com; frank_1005@msn.com; xfs@oss.sgi.com
> Subject: Re: Delaylog information enquiry
>
> On Wed, Jul 30, 2014 at 07:42:32AM +0200, Grozdan wrote:
> > On Wed, Jul 30, 2014 at 1:41 AM, Dave Chinner <david@fromorbit.com> wrote:
> > > Note that this does not change file data behaviour. In this case you
> > > need to add the "sync" mount option, which forces all buffered IO to
> > > be synchronous and so will be *very slow*. But if you've already
> > > turned off the BBWC on the RAID controller then your storage is
> > > already terribly slow and so you probably won't care about making
> > > performance even worse...
> >
> > Dave, excuse my ignorant questions
> >
> > I know the Linux kernel keeps data in cache up to 30 seconds before a
> > kernel daemon flushes it to disk, unless
> > the configured dirty ratio (which is 40% of RAM, iirc) is reached
>
> 10% of RAM, actually.
>
> > before these 30 seconds so the flush is done before it
> >
> > What I did is lower these 30 seconds to 5 seconds so every 5 seconds
> > data is flushed to disk (I've set the dirty_expire_centisecs to 500).
> > So, are there any drawbacks in doing this?
>
> Depends on your workload. For a desktop, you probably won't notice
> anything different. For a machine that creates lots of temporary
> files and then removes them (e.g. build machines) then it could
> crater performance completely because it causes writeback before the
> files are removed...
>
> > I mean, I don't care *that*
> > much for performance but I do want my dirty data to be on
> > storage in a reasonable amount of time. I looked at the various sync
> > mount options but they all are synchronous so it is my
> > impression they'll be slower than giving the kernel 5 seconds to keep
> > data and then flush it.
> >
> > From XFS perspective, I'd like to know if this is not recommended or
> > if it is? I know that with setting the above to 500 centisecs
> > means that there will be more writes to disk and potentially may
> > result in tear & wear, thus shortening the lifetime of the
> > storage
> >
> > This is a regular desktop system with a single Seagate Constellation
> > SATA disk so no RAID, LVM, thin provision or anything else
> >
> > What do you think? :)
>
> I don't think it really matters either way. I don't change
> the writeback time on my workstations, build machines or test
> machines, but I actually *increase* it on my laptops to save power
> by not writing to disk as often. So if you want a little more
> safety, then reducing the writeback timeout shouldn't have any
> significant affect on performance or wear unless you are doing
> something unusual....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
[-- Attachment #1.2: Type: text/html, Size: 4821 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Delaylog information enquiry
2014-07-30 8:18 ` Dave Chinner
2014-07-30 11:44 ` Frank .
@ 2014-07-30 21:18 ` Grozdan
2014-07-30 22:57 ` Dave Chinner
1 sibling, 1 reply; 9+ messages in thread
From: Grozdan @ 2014-07-30 21:18 UTC (permalink / raw)
To: Dave Chinner; +Cc: Brian Foster, Frank ., xfs@oss.sgi.com
On Wed, Jul 30, 2014 at 10:18 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, Jul 30, 2014 at 07:42:32AM +0200, Grozdan wrote:
>> On Wed, Jul 30, 2014 at 1:41 AM, Dave Chinner <david@fromorbit.com> wrote:
>> > Note that this does not change file data behaviour. In this case you
>> > need to add the "sync" mount option, which forces all buffered IO to
>> > be synchronous and so will be *very slow*. But if you've already
>> > turned off the BBWC on the RAID controller then your storage is
>> > already terribly slow and so you probably won't care about making
>> > performance even worse...
>>
>> Dave, excuse my ignorant questions
>>
>> I know the Linux kernel keeps data in cache up to 30 seconds before a
>> kernel daemon flushes it to disk, unless
>> the configured dirty ratio (which is 40% of RAM, iirc) is reached
>
> 10% of RAM, actually.
>
>> before these 30 seconds so the flush is done before it
>>
>> What I did is lower these 30 seconds to 5 seconds so every 5 seconds
>> data is flushed to disk (I've set the dirty_expire_centisecs to 500).
>> So, are there any drawbacks in doing this?
>
> Depends on your workload. For a desktop, you probably won't notice
> anything different. For a machine that creates lots of temporary
> files and then removes them (e.g. build machines) then it could
> crater performance completely because it causes writeback before the
> files are removed...
>
>> I mean, I don't care *that*
>> much for performance but I do want my dirty data to be on
>> storage in a reasonable amount of time. I looked at the various sync
>> mount options but they all are synchronous so it is my
>> impression they'll be slower than giving the kernel 5 seconds to keep
>> data and then flush it.
>>
>> From XFS perspective, I'd like to know if this is not recommended or
>> if it is? I know that with setting the above to 500 centisecs
>> means that there will be more writes to disk and potentially may
>> result in tear & wear, thus shortening the lifetime of the
>> storage
>>
>> This is a regular desktop system with a single Seagate Constellation
>> SATA disk so no RAID, LVM, thin provision or anything else
>>
>> What do you think? :)
>
> I don't think it really matters either way. I don't change
> the writeback time on my workstations, build machines or test
> machines, but I actually *increase* it on my laptops to save power
> by not writing to disk as often. So if you want a little more
> safety, then reducing the writeback timeout shouldn't have any
> significant affect on performance or wear unless you are doing
> something unusual....
>
> Cheers,
>
> Dave.
Thanks Dave :)
I don't want to start another thread as this is my last question but
it's one unrelated to the original question from Frank
One of my partitions was almost full (there was 5 GB over according to
df -h). I had about 8 torrents open in the client, all sizes between 4
and 6 GB (they were all downloaded and got never "released" from the
client as I was seeding back). When I tried to add a torrent to
download which was 3 GB, the client reported that there was no more
space left over on the partition. I suspect this is related to
speculative preallocation and because the 8 torrents were all "open"
they still had extra space allocated by the speculative preallocation
and thus I couldn't add the 3GB torrent even though df says there was
5GB over but in reality it was much less. Am I correct on this or is
there something completely else that happened?
Thanks ;)
> --
> Dave Chinner
> david@fromorbit.com
--
Yours truly
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Delaylog information enquiry
2014-07-30 11:44 ` Frank .
@ 2014-07-30 22:53 ` Dave Chinner
0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2014-07-30 22:53 UTC (permalink / raw)
To: Frank .; +Cc: neutrino8@gmail.com, xfs@oss.sgi.com
On Wed, Jul 30, 2014 at 01:44:49PM +0200, Frank . wrote:
> Indeed, I turned sync and wsync flags on. As excpected, I had terribly low performance (1MB/s for write operations). So I decided to turn them back off. (I got my 100 MB/s write throughput back).
> I just wanted to reduce as much as possible unnecessary cache between my VM's and my physcal hard drives knowing that there are up to 8 write cache levels.
> I'm getting off the subject a bit but here is the list. This is only my conclusion. I don't know if I'm right.
>
> - Guest page cache.
> - Virtual disk drive write cache. (off KVM cache=directsync)
> - Host page cache. (off KVM cache=directsync)
Pretty normal. I tend to use cache=none rather than cache=directsync
because cache=none behaves exactly like a normal disk, including
write cache behaviour. So as long as you use barriers in your guest
filesystems (xfs, ext4, btrfs all do by default) then it is
no different to running the guest on a real disk with a small
volatile write cache.
i.e. when your app/database issues a fsync() in the guest, the guest
filesystem issues a flush/fua sequence and KVM then guarantees that
it only returns when all the previously written data to that file is
on stable storage. As long as all the layers below KVM provide this
same guarantee, then you don't need to turn caches off at all.
> - GlusterFS cache. (off)
> - NAS page cache. (?)
> - XFS cache (filesystem).
The gluster client side cache is being avoided due to KVM direct IO
config, the gluster server/NAS page cache/XFS cache are all the same
thing from a data perspective (i.e. 1 layer, not 3). AFAIK this is
all buffered IO, and so the only way to get data in the backing XFS
filesystem consistent on disk is for the application to issue a
fsync() on the file at the gluster client side. This comes from the
guest via KVM translating flush/fua operations or via the KVM IO
mechanism - gluster then takes care of the rest.
If KVM never issues a fsync() operation, then lower level caches
will never be flushed correctly regardless of whether you turn off
all caching or not. IOWs, fsync() is required at the XFS level to
synchronise allocation transactions with data writes, and the only
way to have that happen for the layer above xfs to issue
f[data]sync() on the relevant XFS file(s)...
Hence you need to keep in mind that turning off high level caches
does not guarantee that low level caching behaviour will behave as
you expect - even with high level caching turned off you still need
those layers to propagate the data integrity directives from the top
of the stack to the bottom so that every layer can do the right
thing regardless of whether they are caching data or not.
i.e. caching doesn't cause data loss - it's the incorrect
propagation or non-existant use of application level data
synchronisation primitives that cause data loss....
> - RAID controller write cache. (off)
There's no benefit to turning this off if it's battery backed - all
turning it off will do is cause performance to be horrible,
especially when you turn off all the other layers of caching above
the RAID controller.
> - Physical hard drive write cache. (off)
Right, those definitely need to be off so that the RAID controller
doesn't have internal consistency problems when power fails.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Delaylog information enquiry
2014-07-30 21:18 ` Grozdan
@ 2014-07-30 22:57 ` Dave Chinner
0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2014-07-30 22:57 UTC (permalink / raw)
To: Grozdan; +Cc: Brian Foster, Frank ., xfs@oss.sgi.com
On Wed, Jul 30, 2014 at 11:18:11PM +0200, Grozdan wrote:
> I don't want to start another thread as this is my last question but
> it's one unrelated to the original question from Frank
You should always start a new thread when you have an unrelated
question. At minimum, you should change the subject line...
> One of my partitions was almost full (there was 5 GB over according to
> df -h). I had about 8 torrents open in the client, all sizes between 4
> and 6 GB (they were all downloaded and got never "released" from the
> client as I was seeding back). When I tried to add a torrent to
> download which was 3 GB, the client reported that there was no more
> space left over on the partition. I suspect this is related to
> speculative preallocation and because the 8 torrents were all "open"
> they still had extra space allocated by the speculative preallocation
> and thus I couldn't add the 3GB torrent even though df says there was
> 5GB over but in reality it was much less. Am I correct on this or is
> there something completely else that happened?
No idea - not enough information. Please start a new thread,
including the information here:
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
as well as the 'xfs_bmap -vp' output for the torrents in question.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-07-30 22:58 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-29 8:53 Delaylog information enquiry Frank .
2014-07-29 12:38 ` Brian Foster
2014-07-29 23:41 ` Dave Chinner
2014-07-30 5:42 ` Grozdan
2014-07-30 8:18 ` Dave Chinner
2014-07-30 11:44 ` Frank .
2014-07-30 22:53 ` Dave Chinner
2014-07-30 21:18 ` Grozdan
2014-07-30 22:57 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox