* Re: INFO: task btrfs-transacti:204 blocked for more than 120 seconds. (more like 8+min)
2015-07-23 19:12 INFO: task btrfs-transacti:204 blocked for more than 120 seconds. (more like 8+min) james harvey
@ 2015-07-23 19:54 ` Austin S Hemmelgarn
2015-07-24 1:11 ` Duncan
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Austin S Hemmelgarn @ 2015-07-23 19:54 UTC (permalink / raw)
To: james harvey, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3914 bytes --]
On 2015-07-23 15:12, james harvey wrote:
> Up to date Arch. linux kernel 4.1.2-2. Fresh O/S install 12 days
> ago. No where near full - 34G used on a 4.6T drive. 32GB memory.
>
> Installed bonnie++ 1.97-1.
>
> $ bonnie++ -d bonnie -m btrfs-disk -f -b
>
> I started trying to run with a "-s 4G" option, to use 4GB files for
> performance measuring. It refused to run, and said "file size should
> be double RAM for good results". I sighed, removed the option, and
> let it run, defaulting to **64GB files**. So, yeah, big files. But,
> I do work with Photoshop .PSB files that get that large.
>
> During the first two lines ("Writing intelligently..." and
> "Rewriting..." the filesystem seems to be completely locked out for
> anything other than bonnie++. KDE stops being able to switch focus,
> change tasks. Can switch to tty's and log in, do things like "ls",
> but attempting to write to the filesystem hangs. Can switch back to
> KDE, but screen is black with cursor until bonnie++ completes. top
> didn't show excessive CPU usage.
>
> My dmesg is at http://www.pastebin.ca/3072384 Attaching it seemed to
> make the message not go out to the list.
>
> Yes, my kernel is tained... See "[5.310093] nvidia: module license
> 'NVIDIA' taints kernel." Sigh, it's just that the nvidia module
> license isn't GPL...
>
> During later bonnie++ writing phases (start 'em", "Create files in
> sequential order...", "Create files in random order") show no
> detrimental effect on the system.
>
> I see some 1.5+ year old references to messages like "INFO: task
> btrfs... blocked for more than 120 seconds." With the amount of
> development since then, figured I'd pretty much ignore those and bring
> up the issue again.
>
> I think the "Writing intelligently" phase is sequential, and the old
> references I saw were regarding many re-writes sporadically in the
> middle.
>
> What I did see from years ago seemed to be that you'd have to disable
> COW where you knew there would be large files. I'm really hoping
> there's a way to avoid this type of locking, because I don't think I'd
> be comfortable knowing a non-root user could bomb the system with a
> large file in the wrong area.
>
> IF I do HAVE to disable COW, I know I can do it selectively. But, if
> I did it everywhere... Which in that situation I would, because I
> can't afford to run into many minute long lockups on a mistake... I
> lose compression, right? Do I lose snapshots? (Assume so, but hope
> I'm wrong.) What else do I lose? Is there any advantage running
> btrfs without COW anywhere over other filesystems?
>
> How would one even know where the division is between a file small
> enough to allow on btrfs, vs one not to?
>
First off, you're running on a traditional hard disk, aren't you?
That's almost certainly why the first few parts of bonnie++ effectively
hung the system. WRT that issue, there's nothing I can really give
advice wise other than to either get more and faster RAM, or get an SSD
to use for your system disk (and use the huge hard drive for data files
only).
As far as NOCOW, you can still do snapshots, although you lose
compression, data integrity (without COW, BTRFS's built in RAID is
actually _worse_ than other software RAID, because without COW being
enabled, it can't use checksums on the filesystem blocks), and data
de-duplication. Overall, there are still advantages to using BTRFS even
with NOCOW (much easier data migration when upgrading storage for
example, btrfs-replace is a wonderful thing :), but most of the biggest
advantages are lost.
Also, if you can deal with not having CUDA support, you should probably
try using the noveau driver instead of NVIDIA's proprietary one, OpenGL
(and almost every other rendering API as well) is horribly slow on the
official NVIDIA driver.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: INFO: task btrfs-transacti:204 blocked for more than 120 seconds. (more like 8+min)
2015-07-23 19:12 INFO: task btrfs-transacti:204 blocked for more than 120 seconds. (more like 8+min) james harvey
2015-07-23 19:54 ` Austin S Hemmelgarn
@ 2015-07-24 1:11 ` Duncan
2015-07-30 9:09 ` Russell Coker
2015-07-30 9:00 ` Russell Coker
2015-07-30 17:51 ` Chris Murphy
3 siblings, 1 reply; 7+ messages in thread
From: Duncan @ 2015-07-24 1:11 UTC (permalink / raw)
To: linux-btrfs
james harvey posted on Thu, 23 Jul 2015 19:12:38 +0000 as excerpted:
> Up to date Arch. linux kernel 4.1.2-2. Fresh O/S install 12 days ago.
> No where near full - 34G used on a 4.6T drive. 32GB memory.
>
> Installed bonnie++ 1.97-1.
>
> $ bonnie++ -d bonnie -m btrfs-disk -f -b
>
> I started trying to run with a "-s 4G" option, to use 4GB files for
> performance measuring. It refused to run, and said "file size should be
> double RAM for good results". I sighed, removed the option, and let it
> run, defaulting to **64GB files**. So, yeah, big files. But,
> I do work with Photoshop .PSB files that get that large.
Not being a dev I won't attempt to address the btrfs problem itself, but
the below may be useful...
FWIW, there's a kernel commandline option that can be used to tell the
kernel that you have less memory than you actually do, for testing in
memory-related cases such as this. Of course it means rebooting with
that option, so it's not something you'd normally use in production, but
for testing it's an occasionally useful trick that sure beats physically
unplugging memory DIMMs! =:^)
The option is mem=nn[KMG]. You may also need memmap=, presumably
memmap=nn[KMG]$ss[KMG], to reserve the unused memory area, preventing its
use for PCI address space, since that would collide with the physical
memory that's there but unused due to mem=.
That should let you test with mem=2G, so double-memory becomes 4G. =:^)
See $KERNDIR/Documentation/kernel-parameters.txt for the details on that
and the many other available kernel commandline options.
Meanwhile, does bonnie do pre-allocation for its tests? If so, that's
likely the problem, since pre-allocation on a cow-based filesystem
doesn't work the way people are used to overwrite-in-place based
filesystems. If there's an option for that, try turning it off and see
if your results are different.
Also, see the btrfs autodefrag mount option discussion below. It works
best with under quarter-gig files, tho some people don't see issues to a
gig on spinning rust, more on fast ssd. There's more detail in the
discussion below.
> Yes, my kernel is tained... See "[5.310093] nvidia: module license
> 'NVIDIA' taints kernel." Sigh, it's just that the nvidia module license
> isn't GPL...
But it's more than that. Kernel modules can do whatever the kernel can
do, and you're adding black-box code that for all the kernel devs know
could be doing /anything/ -- there must be a reason the nvidia folks
don't want to respect user rights and make the code transparent so people
can actually see what it's doing, after all, or there'd be no reason to
infringe those rights.
For some people (devs or not), this is a big issue, because they are,
after all, expecting you to waive away your legal rights to damages, etc,
if it harms your system, without giving you (or devs you trust) the right
to actually examine the code to see what it's doing before asking you to
waive those rights. As the sig below says, if you use those programs,
you're letting them be your master.
So it's far from "just" being that the license isn't GPL. There's
technical, legal and ethical reasons to insist on being able to examine
code (or let those you trust examine it) before waiving your rights to
damages should it harm you or your property, as well as to not worry so
much about trying to debug problems when such undebuggable black-box code
is deliberately inserted in the kernel and allowed to run.
Tho in this particular case the existence of the black-box code likely
isn't interfering with the results. But it would be useful if you could
duplicate the results without that blackbox code in the picture, instead
of expecting others to do it for you. That's certainly doable at the
user level, preserving the time of the devs for actually fixing the bugs
found. =:^)
> What I did see from years ago seemed to be that you'd have to disable
> COW where you knew there would be large files. I'm really hoping
> there's a way to avoid this type of locking, because I don't think I'd
> be comfortable knowing a non-root user could bomb the system with a
> large file in the wrong area.
The problem with cow isn't large files in general, it's rewrites into the
middle of them (as opposed to append-writes). If the writes are
sequential appends, or if it's write-one-read-many, cow on large files
doesn't tend to be an issue.
But of course if you're allocating and fsyncing a file, then writing into
it, you're in effect rewriting into the middle of it, and cow again
becomes an issue. As I mentioned above, this might be the case with
bonnie, since its default assumptions would be rewrite-in-place, where
pre-allocation tends to be an optimization, not the pessimization it can
be on cow-based filesystems.
> IF I do HAVE to disable COW, I know I can do it selectively. But, if I
> did it everywhere... Which in that situation I would, because I can't
> afford to run into many minute long lockups on a mistake...
If you have to disable cow everywhere... there's far less reason to run
btrfs in the first place, since that kills many (but not all) of the
reasons you'd run it. So while possible, it's obviously not the ideal.
Tho personally, I'd rather use another filesystem for files I'd set nocow
on btrfs, in no small measure because btrfs really isn't fully stable and
mature yet, and the loss of features due to nocow is enough that I'd
rather simply forget it and use the more stable and mature filesystem as
opposed to additional risk of the not yet fully stable nocow-crippled
btrfs, in the first place. But I tend to be far less partitioning-averse
than many already, so I already partition up my devices and another
partition to dedicate to some other filesystem to avoid nocow files on
btrfs isn't the big deal to me that it would be to people who want to
treat a single big btrfs as a big storage pool, using subvolumes instead
of partitions or lvm, and who thus run away screaming from the idea of
having to partition up and do a dedicated non-btrfs filesystem for the
files in question, when they can simply set them nocow and keep them on
their big btrfs storage pool.
> I lose
> compression, right? Do I lose snapshots? (Assume so, but hope I'm
> wrong.) What else do I lose? Is there any advantage running btrfs
> without COW anywhere over other filesystems?
You lose compression, yes.
You don't lose snapshots, altho they require COW since they work by
locking the existing extents in place just as they are, because in the
case of writes to nocow files after snapshots, the first write to a block
is cow anyway, since the existing block is locked in place. Sometimes
this is referred to as cow1, since the first write after the snapshot
will cow, but after that, until the next snapshot at least, further
writes to the already cowed block will again rewrite-in-place. So the
effect of snapshots on nocow is to reduce but not eliminate the effect of
nocow (which is generally set to avoid fragmentation), tho if you're
doing extreme snapshotting, say every minute, the fragmentation avoidance
of nocow is obviously close to nullified.
You also lose checksumming and thus btrfs' data integrity features, altho
you'll still have metadata checksumming.
You still have some other features, however. As mentioned, snapshotting
still works, altho at the cost of not avoiding cow entirely (cow1).
Subvolumes still work. And the multi-device features aren't affected
except that as mentioned you lose the data integrity feature and thus the
ability to repair a bad copy with a good copy, that normally comes with
btrfs raid1/10 (and corresponding parity-repair with raid5/6, tho it was
only fully implemented with 3.19, and thus isn't yet as stable and mature
as raid1/10).
But basically, if you're doing global nocow, the remaining btrfs features
aren't anything special and you can get them elsewhere, say by layering
some other filesystem on top of mdraid or dmraid, and using either
partitioning or lvm in place of subvolumes.
> How would one even know where the division is between a file small
> enough to allow on btrfs, vs one not to?
The experience with btrfs' autodefrag mount option suggests that people
don't generally have any trouble with it at all to a quarter gig (256
MiB) or so, while at least on spinning rust, problems are usually
apparent at a GiB. Half to three-quarter gig is the range at which most
people start seeing issues. On a reasonably fast ssd, at a guess I'd say
the range is 2-10 GiB, or might not be hit at all, tho due to the per-gig
expense of ssd storage, in general people don't tend to use it for files
over a few GiB except in fast database use-cases where expense basically
doesn't figure in at all.
But I'd guess autodefrag to hit the interactive issues before other usage
would. So at a guess, I'd say you'd be good to a gig or two on spinning
rust, but would perhaps hit issues between 2-10 gig.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: INFO: task btrfs-transacti:204 blocked for more than 120 seconds. (more like 8+min)
2015-07-23 19:12 INFO: task btrfs-transacti:204 blocked for more than 120 seconds. (more like 8+min) james harvey
2015-07-23 19:54 ` Austin S Hemmelgarn
2015-07-24 1:11 ` Duncan
@ 2015-07-30 9:00 ` Russell Coker
2015-07-30 17:51 ` Chris Murphy
3 siblings, 0 replies; 7+ messages in thread
From: Russell Coker @ 2015-07-30 9:00 UTC (permalink / raw)
To: james harvey; +Cc: linux-btrfs
On Fri, 24 Jul 2015 05:12:38 AM james harvey wrote:
> I started trying to run with a "-s 4G" option, to use 4GB files for
> performance measuring. It refused to run, and said "file size should
> be double RAM for good results". I sighed, removed the option, and
> let it run, defaulting to **64GB files**. So, yeah, big files. But,
> I do work with Photoshop .PSB files that get that large.
You can use the "-r0" option to stop it insisting on twice the RAM size.
However if you have files that are less than twice the RAM then the test
results will be unrealistic as read requests will be mostly satisfied from
cache.
> During the first two lines ("Writing intelligently..." and
> "Rewriting..." the filesystem seems to be completely locked out for
> anything other than bonnie++. KDE stops being able to switch focus,
> change tasks. Can switch to tty's and log in, do things like "ls",
> but attempting to write to the filesystem hangs. Can switch back to
> KDE, but screen is black with cursor until bonnie++ completes. top
> didn't show excessive CPU usage.
That sort of problem isn't unique to BTRFS. BTRFS has had little performance
optimisation so it might be worse than other filesystems in that regard. But
on any filesystem you can expect situations where one process that is doing
non-stop writes fills up buffers and starves other processes.
Note that when a single disk access takes 8000ms+ (more than 8 seconds) then
high level operations involving multiple files will take much longer.
> I think the "Writing intelligently" phase is sequential, and the old
> references I saw were regarding many re-writes sporadically in the
> middle.
Intelligent writes is sequential, re-writes is reading and writing
sequentially.
> What I did see from years ago seemed to be that you'd have to disable
> COW where you knew there would be large files. I'm really hoping
> there's a way to avoid this type of locking, because I don't think I'd
> be comfortable knowing a non-root user could bomb the system with a
> large file in the wrong area.
Disabling CoW won't solve all issues related to sharing disk IO capacity
between users. Also disabling CoW will remove all BTRFS benefits apart from
subvols, and subvols aren't that useful when snapshots aren't an option.
> IF I do HAVE to disable COW, I know I can do it selectively. But, if
> I did it everywhere... Which in that situation I would, because I
> can't afford to run into many minute long lockups on a mistake... I
> lose compression, right? Do I lose snapshots? (Assume so, but hope
> I'm wrong.) What else do I lose? Is there any advantage running
> btrfs without COW anywhere over other filesystems?
I believe that when you disable CoW and make a snapshot there will be one CoW
stage for each block until it's copied somewhere else.
> How would one even know where the division is between a file small
> enough to allow on btrfs, vs one not to?
http://doc.coker.com.au/projects/memlockd/
If a hostile user wrote a program that used fsync() they could reproduce such
problems with much smaller files. My memlockd program alleviates such problems
by locking the pages of important programs and libraries into RAM.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: INFO: task btrfs-transacti:204 blocked for more than 120 seconds. (more like 8+min)
2015-07-23 19:12 INFO: task btrfs-transacti:204 blocked for more than 120 seconds. (more like 8+min) james harvey
` (2 preceding siblings ...)
2015-07-30 9:00 ` Russell Coker
@ 2015-07-30 17:51 ` Chris Murphy
2015-07-30 17:55 ` Chris Murphy
3 siblings, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2015-07-30 17:51 UTC (permalink / raw)
To: james harvey; +Cc: Btrfs BTRFS
On Thu, Jul 23, 2015 at 1:12 PM, james harvey <jamespharvey20@gmail.com> wrote:
> Up to date Arch. linux kernel 4.1.2-2. Fresh O/S install 12 days
> ago. No where near full - 34G used on a 4.6T drive. 32GB memory.
>
> Installed bonnie++ 1.97-1.
>
> $ bonnie++ -d bonnie -m btrfs-disk -f -b
>
> I started trying to run with a "-s 4G" option, to use 4GB files for
> performance measuring. It refused to run, and said "file size should
> be double RAM for good results". I sighed, removed the option, and
> let it run, defaulting to **64GB files**. So, yeah, big files. But,
> I do work with Photoshop .PSB files that get that large.
>
> During the first two lines ("Writing intelligently..." and
> "Rewriting..." the filesystem seems to be completely locked out for
> anything other than bonnie++. KDE stops being able to switch focus,
> change tasks. Can switch to tty's and log in, do things like "ls",
> but attempting to write to the filesystem hangs. Can switch back to
> KDE, but screen is black with cursor until bonnie++ completes. top
> didn't show excessive CPU usage.
>
> My dmesg is at http://www.pastebin.ca/3072384 Attaching it seemed to
> make the message not go out to the list.
I can't tell what actually instigates this, as there are several blocked tasks.
INFO: task btrfs-cleaner:203 blocked for more than 120 seconds.
INFO: task btrfs-transacti:204 blocked for more than 120 seconds.
My suggestion, is to file a bug. Include all of the system specs and a
more concise set of reproduce steps from above; but then also include
sysrq-w output. This will dump into kernel messages, it it's too big
it might fill the kernel message buffer. So you can use log_buf_len=1M
boot parameter to increase it or if this is a systemd system, then all
of it is in journalctl -k and you don't need to worry about the buffer
size.
https://www.kernel.org/doc/Documentation/sysrq.txt
So that'd be
# echo 1 >/proc/sys/kernel/sysrq
## reproduce the problem
echo w > /proc/sysrq-trigger
I never know if a developer wants w or t output, but for blocked tasks
I do a w and post that as attachment; and then sometimes also a
separate cut/paste attachment of t output.
> IF I do HAVE to disable COW, I know I can do it selectively.
I don't think you need to worry about this for your use case. You're
talking about big Photoshop files, it's good to COW those because if
there's a crash of the file server while writing out a change, then
the old file is available and not corrupt, since the write didn't
finish. Of course you lost any chances since the last save, but that's
always true. What happens in addition to that with non-COW file
systems that overwrite, and if you set the files to nocow, is that a
crash or hang on overwrite damages the file often irrecoverably. So
here COW is good. And you should test with that too in order to have a
fair test.
--
Chris Murphy
^ permalink raw reply [flat|nested] 7+ messages in thread