Periodic frame losses when recording to btrfs volume with OBS

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Periodic frame losses when recording to btrfs volume with OBS
@ 2018-01-20 10:47 Sebastian Ochmann
  2018-01-21 10:04 ` Qu Wenruo
  2018-01-22 14:27 ` Chris Mason
  0 siblings, 2 replies; 15+ messages in thread
From: Sebastian Ochmann @ 2018-01-20 10:47 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I would like to describe a real-world use case where btrfs does not 
perform well for me. I'm recording 60 fps, larger-than-1080p video using 
OBS Studio [1] where it is important that the video stream is encoded 
and written out to disk in real-time for a prolonged period of time (2-5 
hours). The result is a H264 video encoded on the GPU with a data rate 
ranging from approximately 10-50 MB/s.

The hardware used is powerful enough to handle this task. When I use a 
XFS volume for recording, no matter whether it's a SSD or HDD, the 
recording is smooth and no frame drops are reported (OBS has a nice 
Stats window where it shows the number of frames dropped due to encoding 
lag which seemingly also includes writing the data out to disk).

However, when using a btrfs volume I quickly observe severe, periodic 
frame drops. It's not single frames but larger chunks of frames that a 
dropped at a time. I tried mounting the volume with nobarrier but to no 
avail.

Of course, the simple fix is to use a FS that works for me(TM). However 
I thought since this is a common real-world use case I'd describe the 
symptoms here in case anyone is interested in analyzing this behavior. 
It's not immediately obvious that the FS makes such a difference. Also, 
if anyone has an idea what I could try to mitigate this issue (mount or 
mkfs options?) I can try that.

I saw this behavior on two different machines with kernels 4.14.13 and 
4.14.5, both Arch Linux. btrfs-progs 4.14, OBS 20.1.3-241-gf5c3af1b 
built from git.

Best regards
Sebastian

[1] https://github.com/jp9000/obs-studio

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-20 10:47 Periodic frame losses when recording to btrfs volume with OBS Sebastian Ochmann
@ 2018-01-21 10:04 ` Qu Wenruo
  2018-01-21 15:27   ` Sebastian Ochmann
  2018-01-22 14:27 ` Chris Mason
  1 sibling, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2018-01-21 10:04 UTC (permalink / raw)
  To: Sebastian Ochmann, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2673 bytes --]



On 2018年01月20日 18:47, Sebastian Ochmann wrote:
> Hello,
> 
> I would like to describe a real-world use case where btrfs does not
> perform well for me. I'm recording 60 fps, larger-than-1080p video using
> OBS Studio [1] where it is important that the video stream is encoded
> and written out to disk in real-time for a prolonged period of time (2-5
> hours). The result is a H264 video encoded on the GPU with a data rate
> ranging from approximately 10-50 MB/s.

> 
> The hardware used is powerful enough to handle this task. When I use a
> XFS volume for recording, no matter whether it's a SSD or HDD, the
> recording is smooth and no frame drops are reported (OBS has a nice
> Stats window where it shows the number of frames dropped due to encoding
> lag which seemingly also includes writing the data out to disk).
> 
> However, when using a btrfs volume I quickly observe severe, periodic
> frame drops. It's not single frames but larger chunks of frames that a
> dropped at a time. I tried mounting the volume with nobarrier but to no
> avail.

What's the drop internal? Something near 30s?
If so, try mount option commit=300 to see if it helps.

> 
> Of course, the simple fix is to use a FS that works for me(TM). However
> I thought since this is a common real-world use case I'd describe the
> symptoms here in case anyone is interested in analyzing this behavior.
> It's not immediately obvious that the FS makes such a difference. Also,
> if anyone has an idea what I could try to mitigate this issue (mount or
> mkfs options?) I can try that.

Mkfs.options can help, but only marginally AFAIK.

You could try mkfs with -n 4K (minimal supported nodesize), to reduce
the tree lock critical region by a little, at the cost of more metadata
fragmentation.

And is there any special features enabled like quota?
Or scheduled balance running at background?
Which is known to dramatically impact performance of transaction
commitment, so it's recommended to disable quota/scheduled balance first.


Another recommendation is to use nodatacow mount option to reduce the
CoW metadata overhead, but I doubt about the effectiveness.

Thanks,
Qu

> 
> I saw this behavior on two different machines with kernels 4.14.13 and
> 4.14.5, both Arch Linux. btrfs-progs 4.14, OBS 20.1.3-241-gf5c3af1b
> built from git.
> 
> Best regards
> Sebastian
> 
> [1] https://github.com/jp9000/obs-studio
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-21 10:04 ` Qu Wenruo
@ 2018-01-21 15:27   ` Sebastian Ochmann
  2018-01-21 22:05     ` Chris Murphy
                       ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Sebastian Ochmann @ 2018-01-21 15:27 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 21.01.2018 11:04, Qu Wenruo wrote:
> 
> 
> On 2018年01月20日 18:47, Sebastian Ochmann wrote:
>> Hello,
>>
>> I would like to describe a real-world use case where btrfs does not
>> perform well for me. I'm recording 60 fps, larger-than-1080p video using
>> OBS Studio [1] where it is important that the video stream is encoded
>> and written out to disk in real-time for a prolonged period of time (2-5
>> hours). The result is a H264 video encoded on the GPU with a data rate
>> ranging from approximately 10-50 MB/s.
> 
>>
>> The hardware used is powerful enough to handle this task. When I use a
>> XFS volume for recording, no matter whether it's a SSD or HDD, the
>> recording is smooth and no frame drops are reported (OBS has a nice
>> Stats window where it shows the number of frames dropped due to encoding
>> lag which seemingly also includes writing the data out to disk).
>>
>> However, when using a btrfs volume I quickly observe severe, periodic
>> frame drops. It's not single frames but larger chunks of frames that a
>> dropped at a time. I tried mounting the volume with nobarrier but to no
>> avail.
> 
> What's the drop internal? Something near 30s?
> If so, try mount option commit=300 to see if it helps.

Thank you for your reply. I observed the interval more closely and it 
shows that the first, quite small drop occurs about 10 seconds after 
starting the recording (some initial metadata being written?). After 
that, the interval is indeed about 30 seconds with large drops each time.

Thus I tried setting the commit option to different values. I confirmed 
that the setting was activated by looking at the options "mount" shows 
(see below). However, no matter whether I set the commit interval to 
300, 60 or 10 seconds, the results were always similar. About every 30 
seconds the drive shows activity for a few seconds and the drop occurs 
shortly thereafter. It almost seems like the commit setting doesn't have 
any effect. By the way, the machine I'm currently testing on has 64 GB 
of RAM so it should have plenty of room for caching.

>>
>> Of course, the simple fix is to use a FS that works for me(TM). However
>> I thought since this is a common real-world use case I'd describe the
>> symptoms here in case anyone is interested in analyzing this behavior.
>> It's not immediately obvious that the FS makes such a difference. Also,
>> if anyone has an idea what I could try to mitigate this issue (mount or
>> mkfs options?) I can try that.
> 
> Mkfs.options can help, but only marginally AFAIK.
> 
> You could try mkfs with -n 4K (minimal supported nodesize), to reduce
> the tree lock critical region by a little, at the cost of more metadata
> fragmentation.
> 
> And is there any special features enabled like quota?
> Or scheduled balance running at background?
> Which is known to dramatically impact performance of transaction
> commitment, so it's recommended to disable quota/scheduled balance first.
> 
> 
> Another recommendation is to use nodatacow mount option to reduce the
> CoW metadata overhead, but I doubt about the effectiveness.

I tried the -n 4K and nodatacow options, but it doesn't seem to make a 
big difference, if at all. No quota or auto-balance is active. It's 
basically using Arch Linux default options.

The output of "mount" after setting 10 seconds commit interval:

/dev/sdc1 on /mnt/rec type btrfs 
(rw,relatime,space_cache,commit=10,subvolid=5,subvol=/)

Also tried noatime, but didn't make a difference either.

Best regards
Sebastian

> Thanks,
> Qu >>
>> I saw this behavior on two different machines with kernels 4.14.13 and
>> 4.14.5, both Arch Linux. btrfs-progs 4.14, OBS 20.1.3-241-gf5c3af1b
>> built from git.
>>
>> Best regards
>> Sebastian
>>
>> [1] https://github.com/jp9000/obs-studio
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-21 15:27   ` Sebastian Ochmann
@ 2018-01-21 22:05     ` Chris Murphy
       [not found]     ` <CAJCQCtQOTNZZnkiw2Tq9Mgwnc4pykbOjCb2DCOm4iCjn5K9jQw@mail.gmail.com>
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Chris Murphy @ 2018-01-21 22:05 UTC (permalink / raw)
  Cc: Qu Wenruo, Btrfs BTRFS

On Sun, Jan 21, 2018 at 8:27 AM, Sebastian Ochmann
<ochmann@cs.uni-bonn.de> wrote:
> On 21.01.2018 11:04, Qu Wenruo wrote:

> The output of "mount" after setting 10 seconds commit interval:
>
> /dev/sdc1 on /mnt/rec type btrfs
> (rw,relatime,space_cache,commit=10,subvolid=5,subvol=/)

I wonder if it gets stuck updating v1 space cache. Instead of trying
v2, you could try nospace_cache mount option and see if there's a
change in behavior.


---
Chris Murphy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
       [not found]     ` <CAJCQCtQOTNZZnkiw2Tq9Mgwnc4pykbOjCb2DCOm4iCjn5K9jQw@mail.gmail.com>
@ 2018-01-21 22:33       ` Sebastian Ochmann
  0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Ochmann @ 2018-01-21 22:33 UTC (permalink / raw)
  To: Chris Murphy, linux-btrfs

On 21.01.2018 23:04, Chris Murphy wrote:
> On Sun, Jan 21, 2018 at 8:27 AM, Sebastian Ochmann
> <ochmann@cs.uni-bonn.de> wrote:
>> On 21.01.2018 11:04, Qu Wenruo wrote:
> 
>> The output of "mount" after setting 10 seconds commit interval:
>>
>> /dev/sdc1 on /mnt/rec type btrfs
>> (rw,relatime,space_cache,commit=10,subvolid=5,subvol=/)
> 
> I wonder if it gets stuck updating v1 space cache. Instead of trying
> v2, you could try nospace_cache mount option and see if there's a
> change in behavior.

I tried disabling space_cache, also on a newly formatted volume when 
first mounting it. However, it doesn't seem to make a difference. 
Basically the same lags in the same interval, sorry.

Best regards
Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-21 15:27   ` Sebastian Ochmann
  2018-01-21 22:05     ` Chris Murphy
       [not found]     ` <CAJCQCtQOTNZZnkiw2Tq9Mgwnc4pykbOjCb2DCOm4iCjn5K9jQw@mail.gmail.com>
@ 2018-01-22  0:39     ` Qu Wenruo
  2018-01-22  9:19       ` Nikolay Borisov
  2018-01-22 18:33       ` Sebastian Ochmann
  2018-01-22  8:59     ` Duncan
  3 siblings, 2 replies; 15+ messages in thread
From: Qu Wenruo @ 2018-01-22  0:39 UTC (permalink / raw)
  To: Sebastian Ochmann, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5045 bytes --]



On 2018年01月21日 23:27, Sebastian Ochmann wrote:
> On 21.01.2018 11:04, Qu Wenruo wrote:
>>
>>
>> On 2018年01月20日 18:47, Sebastian Ochmann wrote:
>>> Hello,
>>>
>>> I would like to describe a real-world use case where btrfs does not
>>> perform well for me. I'm recording 60 fps, larger-than-1080p video using
>>> OBS Studio [1] where it is important that the video stream is encoded
>>> and written out to disk in real-time for a prolonged period of time (2-5
>>> hours). The result is a H264 video encoded on the GPU with a data rate
>>> ranging from approximately 10-50 MB/s.
>>
>>>
>>> The hardware used is powerful enough to handle this task. When I use a
>>> XFS volume for recording, no matter whether it's a SSD or HDD, the
>>> recording is smooth and no frame drops are reported (OBS has a nice
>>> Stats window where it shows the number of frames dropped due to encoding
>>> lag which seemingly also includes writing the data out to disk).
>>>
>>> However, when using a btrfs volume I quickly observe severe, periodic
>>> frame drops. It's not single frames but larger chunks of frames that a
>>> dropped at a time. I tried mounting the volume with nobarrier but to no
>>> avail.
>>
>> What's the drop internal? Something near 30s?
>> If so, try mount option commit00 to see if it helps.
> 
> Thank you for your reply. I observed the interval more closely and it
> shows that the first, quite small drop occurs about 10 seconds after
> starting the recording (some initial metadata being written?). After
> that, the interval is indeed about 30 seconds with large drops each time.

This almost proves my assumption to transaction commitment performance.

But...

> 
> Thus I tried setting the commit option to different values. I confirmed
> that the setting was activated by looking at the options "mount" shows
> (see below). However, no matter whether I set the commit interval to
> 300, 60 or 10 seconds, the results were always similar. About every 30
> seconds the drive shows activity for a few seconds and the drop occurs
> shortly thereafter.

Either such mount option has a bug, or some unrelated problem.

As you mentioned the output is about 10~50MiB/s, 30s means 300~1500MiBs.
Maybe it's related to the dirty data amount?

Would you please verify if a lower or higher profile (resulting much
larger or smaller data stream) would affect?


Despite that, I'll dig to see if commit= option has any bug.

And you could also try the nospace_cache mount option provided by Chris
Murphy, which may also help.

Thanks,
Qu

> It almost seems like the commit setting doesn't have
> any effect. By the way, the machine I'm currently testing on has 64 GB
> of RAM so it should have plenty of room for caching.
> 
>>>
>>> Of course, the simple fix is to use a FS that works for me(TM). However
>>> I thought since this is a common real-world use case I'd describe the
>>> symptoms here in case anyone is interested in analyzing this behavior.
>>> It's not immediately obvious that the FS makes such a difference. Also,
>>> if anyone has an idea what I could try to mitigate this issue (mount or
>>> mkfs options?) I can try that.
>>
>> Mkfs.options can help, but only marginally AFAIK.
>>
>> You could try mkfs with -n 4K (minimal supported nodesize), to reduce
>> the tree lock critical region by a little, at the cost of more metadata
>> fragmentation.
>>
>> And is there any special features enabled like quota?
>> Or scheduled balance running at background?
>> Which is known to dramatically impact performance of transaction
>> commitment, so it's recommended to disable quota/scheduled balance first.
>>
>>
>> Another recommendation is to use nodatacow mount option to reduce the
>> CoW metadata overhead, but I doubt about the effectiveness.
> 
> I tried the -n 4K and nodatacow options, but it doesn't seem to make a
> big difference, if at all. No quota or auto-balance is active. It's
> basically using Arch Linux default options.
> 
> The output of "mount" after setting 10 seconds commit interval:
> 
> /dev/sdc1 on /mnt/rec type btrfs
> (rw,relatime,space_cache,commit\x10,subvolid=5,subvol=/)
> 
> Also tried noatime, but didn't make a difference either.
> 
> Best regards
> Sebastian
> 
>> Thanks,
>> Qu >>
>>> I saw this behavior on two different machines with kernels 4.14.13 and
>>> 4.14.5, both Arch Linux. btrfs-progs 4.14, OBS 20.1.3-241-gf5c3af1b
>>> built from git.
>>>
>>> Best regards
>>> Sebastian
>>>
>>> [1] https://github.com/jp9000/obs-studio
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-21 15:27   ` Sebastian Ochmann
                       ` (2 preceding siblings ...)
  2018-01-22  0:39     ` Qu Wenruo
@ 2018-01-22  8:59     ` Duncan
  2018-01-23  8:38       ` ein
  3 siblings, 1 reply; 15+ messages in thread
From: Duncan @ 2018-01-22  8:59 UTC (permalink / raw)
  To: linux-btrfs

Sebastian Ochmann posted on Sun, 21 Jan 2018 16:27:55 +0100 as excerpted:

> On 21.01.2018 11:04, Qu Wenruo wrote:
>> 
>> 
>> On 2018年01月20日 18:47, Sebastian Ochmann wrote:
>>> Hello,
>>>
>>> I would like to describe a real-world use case where btrfs does not
>>> perform well for me. I'm recording 60 fps, larger-than-1080p video
>>> using OBS Studio [1] where it is important that the video stream is
>>> encoded and written out to disk in real-time for a prolonged period of
>>> time (2-5 hours). The result is a H264 video encoded on the GPU with a
>>> data rate ranging from approximately 10-50 MB/s.
>> 
>> 
>>> The hardware used is powerful enough to handle this task. When I use a
>>> XFS volume for recording, no matter whether it's a SSD or HDD, the
>>> recording is smooth and no frame drops are reported (OBS has a nice
>>> Stats window where it shows the number of frames dropped due to
>>> encoding lag which seemingly also includes writing the data out to
>>> disk).
>>>
>>> However, when using a btrfs volume I quickly observe severe, periodic
>>> frame drops. It's not single frames but larger chunks of frames that a
>>> dropped at a time. I tried mounting the volume with nobarrier but to
>>> no avail.
>> 
>> What's the drop internal? Something near 30s?
>> If so, try mount option commit=300 to see if it helps.
> 
> Thank you for your reply. I observed the interval more closely and it
> shows that the first, quite small drop occurs about 10 seconds after
> starting the recording (some initial metadata being written?). After
> that, the interval is indeed about 30 seconds with large drops each
> time.
> 
> Thus I tried setting the commit option to different values. I confirmed
> that the setting was activated by looking at the options "mount" shows
> (see below). However, no matter whether I set the commit interval to
> 300, 60 or 10 seconds, the results were always similar. About every 30
> seconds the drive shows activity for a few seconds and the drop occurs
> shortly thereafter. It almost seems like the commit setting doesn't have
> any effect. By the way, the machine I'm currently testing on has 64 GB
> of RAM so it should have plenty of room for caching.

64 GB RAM...

Do you know about the /proc/sys/vm/dirty_* files and how to use/tweak 
them?  If not, read $KERNDIR/Documentation/sysctl/vm.txt, focusing on 
these files.

These tunables control the amount of writeback cache that is allowed to 
accumulate before the system starts flushing it.  The problem is that the 
defaults for these tunables were selected back when system memory 
normally measured in the MiB, not the GiB of today, so the default ratios 
allow too much dirty data to accumulate before attempting to flush it to 
storage, resulting in flush storms that hog the available IO and starve 
other tasks that might be trying to use it.

The fix is to tweak these settings to try to smooth things out, starting 
background flush earlier, so with a bit of luck the system never hits 
high priority foreground flush mode, or if it does there's not so much to 
be written as much of it has already been done in the background.

There are five files, two pairs of files, one pair controlling foreground 
sizes, the other background, and one file setting the time limit.  The 
sizes can be set by either ratio, percentage of RAM, or bytes, with the 
other appearing as zero when read.

To set these temporarily you write to the appropriate file.  Once you 
have a setting that works well for you, write it to your distro's sysctl 
configuration (/etc/sysctl.conf or /etc/sysctrl.d/*.conf, usually), and 
it should be automatically applied at boot for you.

Here's the settings in my /etc/sysctl.conf, complete with notes about the 
defaults and the values I've chosen for my 16G of RAM.  Note that while I 
have fast ssds now, I set these values back when I had spinning rust.  I 
was happy with them then, and while I shouldn't really need the settings 
on my ssds, I've seen no reason to change them.

At 16G, 1% ~ 160M.  At 64G, it'd be four times larger, 640M, likely too 
chunky a granularity to be useful, so you'll probably want to set the 
bytes value instead of ratio.

# write-cache, foreground/background flushing
# vm.dirty_ratio = 10 (% of RAM)
# make it 3% of 16G ~ half a gig
vm.dirty_ratio = 3
# vm.dirty_bytes = 0

# vm.dirty_background_ratio = 5 (% of RAM)
# make it 1% of 16G ~ 160 M
vm.dirty_background_ratio = 1
# vm.dirty_background_bytes = 0

# vm.dirty_expire_centisecs = 2999 (30 sec)
# vm.dirty_writeback_centisecs = 499 (5 sec)
# make it 10 sec
vm.dirty_writeback_centisecs = 1000

Now the other factor in the picture is how fast your actual hardware can 
write.  hdparm's -t parameter tests sequential write speed and can give 
you some idea.  You'll need to run it as root:

hdparm -t /dev/sda

/dev/sda:
 Timing buffered disk reads: 1578 MB in  3.00 seconds = 525.73 MB/sec

... Like I said, fast ssd...  I believe fast modern spinning rust should 
be 100 MB/sec or so, tho slower devices may only do 30 MB/sec, likely too 
slow for your reported 10-50 MB/sec stream, tho you say yours should be 
fast enough as it's fine with xfs.

Now here's the problem.  As Qu mentions elsewhere on-thread, 30 seconds 
of your 10-50 MB/sec stream is 300-1500 MiB.  Say your available device 
IO bandwidth is 100 MiB/sec.  That should be fine.  But the default 
dirty_* settings allow 5% of RAM in dirty writeback cache before even 
starting low priority background flush, while it won't kick to high 
priority until 10% of RAM or 30 seconds, whichever comes first.

And at 64 GiB RAM, 1% is as I said, about 640 MiB, so 10% is 6.4 GB dirty 
before it kicks to high priority, and 3.2 GB is the 5% accumulation 
before it even starts low priority background writing.  That's assuming 
the 30 second timeout hasn't expired yet, of course.

But as we established above the write stream maxes out at ~1.5 GiB in 30 
seconds, and that's well below the ~3.2 GiB @ 64 GiB RAM that would kick 
in even low priority background writeback!

So at the defaults, the background writeback never kicks in at all, until 
the 30 second timeout expires, forcing immediate high priority foreground 
flushing!

Meanwhile, the way the kernel handles /background/ writeback flushing is 
that it will take the opportunity to writeback what it can while the 
device is idle.  But as we've just established, background never kicks in.

So then the timeout expires and the kernel kicks in high priority 
foreground writeback.

And the kernel handles foreground writeback *MUCH* differently!  
Basically, it stops anything attempting to dirty more writeback cache 
until it can write the dirty cache out.  And it charges the time it 
spends doing just that to the thread it stopped in ordered to do that 
high priority writeback! 

Now as designed this should work well, and it does when the dirty_* 
values are set correctly, because any process that's trying to dirty the 
writeback cache faster than it can be written out, thus kicking in 
foreground mode, gets stopped until the data can be written out, thus 
preventing it from dirtying even MORE cache faster than the system can 
handle it, which in /theory/ is what kicked it into high priority 
foreground mode in the /first/ place.

But as I said, the default ratios were selected when memory was far 
smaller.  With half a gig of RAM, the default 5% to kick in background 
mode would be only ~25 MiB, easily writable within a second on modern 
devices and back then, still writable within say 5-10 seconds.  And if it 
ever reached foreground mode, that would still be only 50 MiB worth, and 
it would still complete in well under the 30 seconds before the next 
expiry.

But with modern RAM levels, my 16 GiB to some extent and your 64 GiB is 
even worse, as we've seen, even our max ~1500 MiB doesn't kick in 
background writeback mode, so the stuff just sits there until it expires 
and then it get slammed into high priority foreground mode, stopping your 
streaming until it has a chance to write some of that dirty data out.

And at our assumed 100 MiB/sec IO bandwidth, that 300-1500 MiB is going 
to take 3-15 seconds to write out, well within the 30 seconds before the 
next expiry, but for a time-critical streaming app, stopping it even the 
minimal 3 seconds is very likely to drop frames!

So try setting something a bit more reasonable and see if it helps.  That 
1% ratio at 16 GiB RAM for ~160 MB was fine for me, but I'm not doing 
critical streaming, and at 64 GiB you're looking at ~640 MB per 1%, as I 
said, too chunky.  For streaming, I'd suggest something approaching the 
value of your per-second IO bandwidth, we're assuming 100 MB/sec here so 
100 MiB but let's round that up to a nice binary 128 MiB, for the 
background value, perhaps half a GiB or 5 seconds worth of writeback time 
for foreground, 4 times the background value.  So:

vm.dirty_background_bytes = 134217728   # 128*1024*1024, 128 MiB
vm.dirty_bytes = 536870912              # 512*1024*1024, 512 MiB

As mentioned, try writing those values directly into /proc/sys/vm/
dirty_background_bytes and dirty_bytes , first, to see if it helps.  If 
my guess is correct, that should vastly improve the situation for you.  
If it does but not quite enough or you just want to try tweaking some 
more, you can tweak it from there, but those are reasonable starting 
values and really should work far better than the default 5% and 10% of 
RAM with 64 GiB of it!

Other things to try tweaking include the IO scheduler -- the default is 
the venerable CFQ but deadline may well be better for a streaming use-
case, and now there's the new multi-queue stuff and the multi-queue kyber 
and bfq schedulers, as well -- and setting IO priority -- probably by 
increasing the IO priority of the streaming app.  The tool to use for the 
latter is called ionice.  Do note, however, that not all schedulers 
implement IO priorities.  CFQ does, but while I think deadline should 
work better for the streaming use-case, it's simpler code and I don't 
believe it implements IO priority.  Similarly for multi-queue, I'd guess 
the low-code-designed-for-fast-direct-PCIE-connected-SSD kyber doesn't 
implement IO priorities, while the more complex and general purpose 
suitable-for-spinning-rust bfq /might/ implement IO priorities.

But I know less about that stuff and it's googlable, should you decide to 
try playing with it too.  I know what the dirty_* stuff does from 
personal experience. =:^)

And to tie up a loose end, xfs has somewhat different design principles 
and may well not be particularly sensitive to the dirty_* settings, while 
btrfs, due to COW and other design choices, is likely more sensitive to 
them than the widely used ext* and reiserfs (my old choice and the basis 
of my own settings, above).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-22  0:39     ` Qu Wenruo
@ 2018-01-22  9:19       ` Nikolay Borisov
  2018-01-22 18:33       ` Sebastian Ochmann
  1 sibling, 0 replies; 15+ messages in thread
From: Nikolay Borisov @ 2018-01-22  9:19 UTC (permalink / raw)
  To: Qu Wenruo, Sebastian Ochmann, linux-btrfs



On 22.01.2018 02:39, Qu Wenruo wrote:
> 
> 
> On 2018年01月21日 23:27, Sebastian Ochmann wrote:
>> On 21.01.2018 11:04, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年01月20日 18:47, Sebastian Ochmann wrote:
>>>> Hello,
>>>>
>>>> I would like to describe a real-world use case where btrfs does not
>>>> perform well for me. I'm recording 60 fps, larger-than-1080p video using
>>>> OBS Studio [1] where it is important that the video stream is encoded
>>>> and written out to disk in real-time for a prolonged period of time (2-5
>>>> hours). The result is a H264 video encoded on the GPU with a data rate
>>>> ranging from approximately 10-50 MB/s.
>>>
>>>>
>>>> The hardware used is powerful enough to handle this task. When I use a
>>>> XFS volume for recording, no matter whether it's a SSD or HDD, the
>>>> recording is smooth and no frame drops are reported (OBS has a nice
>>>> Stats window where it shows the number of frames dropped due to encoding
>>>> lag which seemingly also includes writing the data out to disk).
>>>>
>>>> However, when using a btrfs volume I quickly observe severe, periodic
>>>> frame drops. It's not single frames but larger chunks of frames that a
>>>> dropped at a time. I tried mounting the volume with nobarrier but to no
>>>> avail.
>>>
>>> What's the drop internal? Something near 30s?
>>> If so, try mount option commit00 to see if it helps.
>>
>> Thank you for your reply. I observed the interval more closely and it
>> shows that the first, quite small drop occurs about 10 seconds after
>> starting the recording (some initial metadata being written?). After
>> that, the interval is indeed about 30 seconds with large drops each time.
> 
> This almost proves my assumption to transaction commitment performance.
> 
> But...
> 
>>
>> Thus I tried setting the commit option to different values. I confirmed
>> that the setting was activated by looking at the options "mount" shows
>> (see below). However, no matter whether I set the commit interval to
>> 300, 60 or 10 seconds, the results were always similar. About every 30
>> seconds the drive shows activity for a few seconds and the drop occurs
>> shortly thereafter.
> 
> Either such mount option has a bug, or some unrelated problem.

Looking at transaction_kthread the schedule there is done in
TASK+INTERRUPTIBLE so it's entirely possible that the kthread may be
woken up earlier. The code has been there since 2008. I'm going to run
some write-heavy tests to see how often the transaction kthread is woken
up before it's timeout has elapsed.

> 
> As you mentioned the output is about 10~50MiB/s, 30s means 300~1500MiBs.
> Maybe it's related to the dirty data amount?
> 
> Would you please verify if a lower or higher profile (resulting much
> larger or smaller data stream) would affect?
> 
> 
> Despite that, I'll dig to see if commit= option has any bug.
> 
> And you could also try the nospace_cache mount option provided by Chris
> Murphy, which may also help.
> 
> Thanks,
> Qu
> 
>> It almost seems like the commit setting doesn't have
>> any effect. By the way, the machine I'm currently testing on has 64 GB
>> of RAM so it should have plenty of room for caching.
>>
>>>>
>>>> Of course, the simple fix is to use a FS that works for me(TM). However
>>>> I thought since this is a common real-world use case I'd describe the
>>>> symptoms here in case anyone is interested in analyzing this behavior.
>>>> It's not immediately obvious that the FS makes such a difference. Also,
>>>> if anyone has an idea what I could try to mitigate this issue (mount or
>>>> mkfs options?) I can try that.
>>>
>>> Mkfs.options can help, but only marginally AFAIK.
>>>
>>> You could try mkfs with -n 4K (minimal supported nodesize), to reduce
>>> the tree lock critical region by a little, at the cost of more metadata
>>> fragmentation.
>>>
>>> And is there any special features enabled like quota?
>>> Or scheduled balance running at background?
>>> Which is known to dramatically impact performance of transaction
>>> commitment, so it's recommended to disable quota/scheduled balance first.
>>>
>>>
>>> Another recommendation is to use nodatacow mount option to reduce the
>>> CoW metadata overhead, but I doubt about the effectiveness.
>>
>> I tried the -n 4K and nodatacow options, but it doesn't seem to make a
>> big difference, if at all. No quota or auto-balance is active. It's
>> basically using Arch Linux default options.
>>
>> The output of "mount" after setting 10 seconds commit interval:
>>
>> /dev/sdc1 on /mnt/rec type btrfs
>> (rw,relatime,space_cache,commit\x10,subvolid=5,subvol=/)
>>
>> Also tried noatime, but didn't make a difference either.
>>
>> Best regards
>> Sebastian
>>
>>> Thanks,
>>> Qu >>
>>>> I saw this behavior on two different machines with kernels 4.14.13 and
>>>> 4.14.5, both Arch Linux. btrfs-progs 4.14, OBS 20.1.3-241-gf5c3af1b
>>>> built from git.
>>>>
>>>> Best regards
>>>> Sebastian
>>>>
>>>> [1] https://github.com/jp9000/obs-studio
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-20 10:47 Periodic frame losses when recording to btrfs volume with OBS Sebastian Ochmann
  2018-01-21 10:04 ` Qu Wenruo
@ 2018-01-22 14:27 ` Chris Mason
  1 sibling, 0 replies; 15+ messages in thread
From: Chris Mason @ 2018-01-22 14:27 UTC (permalink / raw)
  To: Sebastian Ochmann, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2608 bytes --]

On 01/20/2018 05:47 AM, Sebastian Ochmann wrote:
> Hello,
> 
> I would like to describe a real-world use case where btrfs does not 
> perform well for me. I'm recording 60 fps, larger-than-1080p video using 
> OBS Studio [1] where it is important that the video stream is encoded 
> and written out to disk in real-time for a prolonged period of time (2-5 
> hours). The result is a H264 video encoded on the GPU with a data rate 
> ranging from approximately 10-50 MB/s.
> 
> The hardware used is powerful enough to handle this task. When I use a 
> XFS volume for recording, no matter whether it's a SSD or HDD, the 
> recording is smooth and no frame drops are reported (OBS has a nice 
> Stats window where it shows the number of frames dropped due to encoding 
> lag which seemingly also includes writing the data out to disk).
> 
> However, when using a btrfs volume I quickly observe severe, periodic 
> frame drops. It's not single frames but larger chunks of frames that a 
> dropped at a time. I tried mounting the volume with nobarrier but to no 
> avail.
> 
> Of course, the simple fix is to use a FS that works for me(TM). However 
> I thought since this is a common real-world use case I'd describe the 
> symptoms here in case anyone is interested in analyzing this behavior. 
> It's not immediately obvious that the FS makes such a difference. Also, 
> if anyone has an idea what I could try to mitigate this issue (mount or 
> mkfs options?) I can try that.
> 
> I saw this behavior on two different machines with kernels 4.14.13 and 
> 4.14.5, both Arch Linux. btrfs-progs 4.14, OBS 20.1.3-241-gf5c3af1b 
> built from git.
> 

This could be a few different things, trying without the space cache was 
already suggested, and that's a top suspect.

How does the application do the writes?  Are they always 4K aligned or 
does it send them out in odd sizes?

The easiest way to nail it down is to use offcputime from the iovisor 
project:


https://github.com/iovisor/bcc/blob/master/tools/offcputime.py

If you haven't already configured this it may take a little while, but 
it's the perfect tool for this problem.

Otherwise, if the stalls are long enough you can try to catch it with 
/proc/<pid>/stack.  I've attached a helper script I often use to dump 
the stack trace of all the tasks in D state.

Just run walker.py and it should give you something useful.  You can use 
walker.py -a to see all the tasks instead of just D state.  This just 
walks /proc/<pid>/stack, so you'll need to run it as someone with 
permissions to see the stack traces of the procs you care about.

-chris



[-- Attachment #2: walker.py --]
[-- Type: text/x-python, Size: 6762 bytes --]

#!/usr/bin/env python3
#
# this walks all the tasks on the system and prints out a stack trace
# of any tasks waiting in D state.  If you pass -a, it will print out
# the stack of every task it finds.
#
# It also makes a histogram of the common stacks so you can see where
# more of the tasks are.
#

import sys
import os

from optparse import OptionParser

usage = "usage: %prog [options]"
parser = OptionParser(usage=usage)
parser.add_option("-a", "--all_tasks", help="Dump all stacks", default=False,
                  action="store_true")
parser.add_option("-s", "--smaps", help="Dump /proc/pid/smaps", default=False,
                  action="store_true")
parser.add_option("-S", "--sort", help="/proc/pid/smaps sort key",
                  default="size", type="str")
parser.add_option("-p", "--pid", help="Filter on pid", default=None,
                 type="str")
parser.add_option("-c", "--command", help="Filter on command name",
                 default=None, type="str")
parser.add_option("-f", "--files", help="don't collapse open files",
                  default=False, action="store_true")
parser.add_option("-v", "--verbose", help="details++", default=False,
                  action="store_true")
(options, args) = parser.parse_args()

stacks = {}

maps = {}

lines = []

# parse the units from a number and normalize into KB
def parse_number(s):
    try:
        words = s.split()
        unit = words[-1].lower()
        number = int(words[1])
        tag = words[0].lower().rstrip(':')

        # we store in kb
        if unit == "mb":
            number = number * 1024
        elif unit == "gb":
            number = number * 1024 * 1024
        elif unit == "tb":
            number = number * 1024 * 1024

        return (tag, number)
    except:
        return (None, None)

# pretty print a number in KB with units
def print_number(num):
    # we store the number in kb
    units = ['KB', 'MB', 'GB', 'TB']
    index = 0

    while num > 1024:
        num /= 1024
        index += 1

    final = float(num + num / 1024)
    return (final, units[index])

# find a given line in the record and pretty print it
def print_line(header, record, tag):
    num, unit = print_number(record[tag])

    if options.verbose:
        header = header + " "
    else:
        header = ""

    print("\t%s%s: %.2f %s" % (header, tag.capitalize(), num, unit))


# print all the lines we care about in a given record
def print_record(record):
    tag = record['tag']

    print(tag)
    print_line(tag, record, "size")
    print_line(tag, record, "rss")
    print_line(tag, record, "pss")
    print_line(tag, record, "anonymous")
    print_line(tag, record, "swap")
    print_line(tag, record, "locked")

# stuff a record with a group of lines from /proc/pid/smaps
# into the smaps hash.  This does all the collapsing by cmd name
# unless we're in -a mode
#
def add_record(cmd, pid, record):
    global maps

    if not record:
        return

    val = record["tag"]

    if val[0] != '[':
        if val.find('/') < 0:
            val = "[anonymous]"
        elif options.files:
            val = "[files:%s]" % val
        else:
            val = "[files]"

    if not options.verbose and val.startswith('[stack:'):
        val = val[0:6] + "]"
    if options.all_tasks:
        val = "%s:%s:%s" % (cmd, pid, val)
    else:
        val = "%s:%s" % (cmd, val)
    proc = maps.setdefault(val, {"tag": val})

    for x in record:
        if x != "tag":
            e = proc.get(x, 0)
            proc[x] = e + record[x]

# open the smap file for a given pid and collect all the records
def add_smap(path, pid, cmd):
    try:
        fd = open(os.path.join(path, "smaps"), "r")
    except:
        return

    record = {}

    while True:
        l = fd.readline()
        if not l:
            break
        l = l.lstrip().rstrip()

        if l.find('-') >= 0:
            add_record(cmd, pid, record)
            record = {}
            words = l.split()

            # if the tag includes information about filenames
            # add them in
            #
            if len(words) > 5:
                record['tag'] = " ".join(words[5:])
            else:
                record['tag'] = words[-1]

            continue

        tag, number = parse_number(l)
        if tag:
            record[tag] = number

    add_record(cmd, pid, record)

# read /proc/pid/stack and add it to the hashes
def add_stack(path, pid, cmd, status):
    global stacks

    try:
        stack = open(os.path.join(path, "stack"), "r").read()
    except:
        return

    if (status != "D" and not options.all_tasks):
        return

    print("%s %s %s" % (pid, cmd, status))
    print(stack)
    v = stacks.get(stack)
    if v:
        v += 1
    else:
        v = 1
    stacks[stack] = v


# worker to read all the files for one individual task
def run_one_task(path):

    try:
        stat = open(os.path.join(path, "stat"), "r").read()
    except:
        pass

    words = stat.split()
    pid, cmd, status = words[0:3]

    cmd = cmd.lstrip('(')
    cmd = cmd.rstrip(')')

    if options.command and options.command != cmd:
        return

    if options.smaps:
        add_smap(path, pid, cmd)
    else:
        add_stack(path, pid, cmd, status)

def print_usage():
    sys.stderr.write("Usage: %s [-a]\n" % sys.argv[0])
    sys.exit(1)

# for a given pid in string form, read the files from proc
def run_pid(name):
    try:
        pid = int(name)
    except:
        return

    if options.smaps:
        p = os.path.join("/proc", name)
        try:
            run_one_task(p)
        except:
            pass
    else:
        p = os.path.join("/proc", name, "task")
        if not os.path.exists(p):
            return

        try:
            for t in os.listdir(p):
                run_one_task(os.path.join(p, t))
        except:
            raise

if options.pid:
    run_pid(options.pid)
else:
    for name in os.listdir("/proc"):
        run_pid(name)

if options.smaps:
    key = options.sort.lower()
    sort = {}

    for x in maps:
        record = maps[x]
        val = record.get(key, 0)
        sort.setdefault(val, []).append(x)

    ar = list(sort.keys())
    ar.sort()
    for x in ar:
        bucket = sort[x]
        for x in bucket:
            print_record(maps[x])
else:
    values = {}
    for stack, count in stacks.items():
        l = values.setdefault(count, [])
        l.append(stack)

    counts = list(values.keys())
    counts.sort(reverse=True)
    if counts:
            print("-----\nstack summary\n")

    for x in counts:
        if x == 1:
            print("1 hit:")
        else:
            print("%d hits: " % x)

        l = values[x]
        for stack in l:
            print(stack)
            print("-----")

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-22  0:39     ` Qu Wenruo
  2018-01-22  9:19       ` Nikolay Borisov
@ 2018-01-22 18:33       ` Sebastian Ochmann
  2018-01-22 19:08         ` Chris Mason
  1 sibling, 1 reply; 15+ messages in thread
From: Sebastian Ochmann @ 2018-01-22 18:33 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, nborisov, clm

First off, thank you for all the responses! Let me reply to multiple 
suggestions at once in this mail.

On 22.01.2018 01:39, Qu Wenruo wrote:
> Either such mount option has a bug, or some unrelated problem.
> 
> As you mentioned the output is about 10~50MiB/s, 30s means 300~1500MiBs.
> Maybe it's related to the dirty data amount?
> 
> Would you please verify if a lower or higher profile (resulting much
> larger or smaller data stream) would affect?

A much lower rate seems to mitigate the problem somewhat, however I'm 
talking about low single-digit MB/s when the problem seems to vanish. 
But even with low, but more realistic, amounts of data the drops still 
happen.

> Despite that, I'll dig to see if commit= option has any bug.
> 
> And you could also try the nospace_cache mount option provided by Chris
> Murphy, which may also help.

I tried the nospace_cache option but it doesn't seem to make a 
difference to me.

On 22.01.2018 15:27, Chris Mason wrote:
 > This could be a few different things, trying without the space cache was
 > already suggested, and that's a top suspect.
 >
 > How does the application do the writes?  Are they always 4K aligned or
 > does it send them out in odd sizes?
 >
 > The easiest way to nail it down is to use offcputime from the iovisor
 > project:
 >
 >
 > https://github.com/iovisor/bcc/blob/master/tools/offcputime.py
 >
 > If you haven't already configured this it may take a little while, but
 > it's the perfect tool for this problem.
 >
 > Otherwise, if the stalls are long enough you can try to catch it with
 > /proc/<pid>/stack.  I've attached a helper script I often use to dump
 > the stack trace of all the tasks in D state.
 >
 > Just run walker.py and it should give you something useful.  You can use
 > walker.py -a to see all the tasks instead of just D state.  This just
 > walks /proc/<pid>/stack, so you'll need to run it as someone with
 > permissions to see the stack traces of the procs you care about.
 >
 > -chris

I tried the walker.py script and was able to catch stack traces when the 
lag happens. I'm pasting two traces at the end of this mail - one when 
it happened using a USB-connected HDD and one when it happened on a SATA 
SSD. The latter is encrypted, hence the dmcrypt_write process. Note 
however that my original problem appeared on a SSD that was not encrypted.

In reply to the mail by Duncan:
> 64 GB RAM...
> 
> Do you know about the /proc/sys/vm/dirty_* files and how to use/tweak 
> them?  If not, read $KERNDIR/Documentation/sysctl/vm.txt, focusing on 
> these files.

At least I have never tweaked those settings yet. I certainly didn't 
know about the foreground/background distinction, that is really 
interesting. Thank you for the very extensive info and guide btw!

> So try setting something a bit more reasonable and see if it helps.  That 
> 1% ratio at 16 GiB RAM for ~160 MB was fine for me, but I'm not doing 
> critical streaming, and at 64 GiB you're looking at ~640 MB per 1%, as I 
> said, too chunky.  For streaming, I'd suggest something approaching the 
> value of your per-second IO bandwidth, we're assuming 100 MB/sec here so 
> 100 MiB but let's round that up to a nice binary 128 MiB, for the 
> background value, perhaps half a GiB or 5 seconds worth of writeback time 
> for foreground, 4 times the background value.  So:
> 
> vm.dirty_background_bytes = 134217728   # 128*1024*1024, 128 MiB
> vm.dirty_bytes = 536870912              # 512*1024*1024, 512 MiB

Now I have good and bad news. The good news is that setting these 
tunables to different values does change something. The bad news is that 
lowering these values only seems to let the lag and frame drops happen 
quicker/more frequently. I have also tried lowering the background bytes 
to, say, 128 MB but the non-background bytes to 1 or 2 GB, but even the 
background task seems to already have a bad enough effect to start 
dropping frames. :( When writing to the SSD, the effect seems to be 
mitigated a little bit, but still frame drops are quickly occurring 
which is unacceptable given that the system is generally able to do better.

By the way, as you can see from the stack traces, in the SSD case blk_mq 
is in use.

> But I know less about that stuff and it's googlable, should you decide to 
> try playing with it too.  I know what the dirty_* stuff does from 
> personal experience. =:^)

"I know what the dirty_* stuff does from personal experience. =:^)" 
sounds quite interesting... :D


Best regards and thanks again
Sebastian


First stack trace:

690 usb-storage D
[<ffffffffc03f4404>] usb_sg_wait+0xf4/0x150 [usbcore]
[<ffffffffc0bd53a3>] usb_stor_bulk_transfer_sglist.part.1+0x63/0xb0 
[usb_storage]
[<ffffffffc0bd5439>] usb_stor_bulk_srb+0x49/0x80 [usb_storage]
[<ffffffffc0bd55d3>] usb_stor_Bulk_transport+0x163/0x3d0 [usb_storage]
[<ffffffffc0bd5f07>] usb_stor_invoke_transport+0x37/0x4c0 [usb_storage]
[<ffffffffc0bd7518>] usb_stor_control_thread+0x1d8/0x2c0 [usb_storage]
[<ffffffff9b09f038>] kthread+0x118/0x130
[<ffffffff9b8002cf>] ret_from_fork+0x1f/0x30
[<ffffffffffffffff>] 0xffffffffffffffff

2505 kworker/u16:2 D
[<ffffffff9b0ac182>] io_schedule+0x12/0x40
[<ffffffff9b347f48>] wbt_wait+0x1b8/0x340
[<ffffffff9b318536>] blk_mq_make_request+0xe6/0x6e0
[<ffffffff9b30c85e>] generic_make_request+0x11e/0x310
[<ffffffff9b30cabc>] submit_bio+0x6c/0x130
[<ffffffffc062fd77>] run_scheduled_bios+0x197/0x580 [btrfs]
[<ffffffffc063b0d0>] btrfs_worker_helper+0x70/0x330 [btrfs]
[<ffffffff9b0990eb>] process_one_work+0x1db/0x410
[<ffffffff9b09934b>] worker_thread+0x2b/0x3d0
[<ffffffff9b09f038>] kthread+0x118/0x130
[<ffffffff9b8002cf>] ret_from_fork+0x1f/0x30
[<ffffffffffffffff>] 0xffffffffffffffff

Second stack trace:

618 dmcrypt_write D
[<ffffffff9b0ac182>] io_schedule+0x12/0x40
[<ffffffff9b347f48>] wbt_wait+0x1b8/0x340
[<ffffffff9b318536>] blk_mq_make_request+0xe6/0x6e0
[<ffffffff9b30c85e>] generic_make_request+0x11e/0x310
[<ffffffffc0351b6e>] dmcrypt_write+0x21e/0x240 [dm_crypt]
[<ffffffff9b09f038>] kthread+0x118/0x130
[<ffffffff9b8002cf>] ret_from_fork+0x1f/0x30
[<ffffffffffffffff>] 0xffffffffffffffff

2866 ffmpeg-mux D
[<ffffffffc06251b1>] btrfs_start_ordered_extent+0x101/0x130 [btrfs]
[<ffffffffc061b0b0>] lock_and_cleanup_extent_if_need+0x340/0x380 [btrfs]
[<ffffffffc061c3c1>] __btrfs_buffered_write+0x261/0x740 [btrfs]
[<ffffffffc06202bf>] btrfs_file_write_iter+0x20f/0x650 [btrfs]
[<ffffffff9b235309>] __vfs_write+0xf9/0x170
[<ffffffff9b23551d>] vfs_write+0xad/0x1a0
[<ffffffff9b235762>] SyS_write+0x52/0xc0
[<ffffffff9b800083>] entry_SYSCALL_64_fastpath+0x1a/0x7d
[<ffffffffffffffff>] 0xffffffffffffffff


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-22 18:33       ` Sebastian Ochmann
@ 2018-01-22 19:08         ` Chris Mason
  2018-01-22 21:17           ` Sebastian Ochmann
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Mason @ 2018-01-22 19:08 UTC (permalink / raw)
  To: Sebastian Ochmann, Qu Wenruo, linux-btrfs, nborisov

On 01/22/2018 01:33 PM, Sebastian Ochmann wrote:

[ skipping to the traces ;) ]

> 2866 ffmpeg-mux D
> [<ffffffffc06251b1>] btrfs_start_ordered_extent+0x101/0x130 [btrfs]
> [<ffffffffc061b0b0>] lock_and_cleanup_extent_if_need+0x340/0x380 [btrfs]
> [<ffffffffc061c3c1>] __btrfs_buffered_write+0x261/0x740 [btrfs]
> [<ffffffffc06202bf>] btrfs_file_write_iter+0x20f/0x650 [btrfs]
> [<ffffffff9b235309>] __vfs_write+0xf9/0x170
> [<ffffffff9b23551d>] vfs_write+0xad/0x1a0
> [<ffffffff9b235762>] SyS_write+0x52/0xc0
> [<ffffffff9b800083>] entry_SYSCALL_64_fastpath+0x1a/0x7d
> [<ffffffffffffffff>] 0xffffffffffffffff

This is where we wait for writes that are already in flight before we're 
allowed to redirty those pages in the file.  It'll happen when we either 
overwrite a page in the file that we've already written, or when we're 
trickling down writes slowly in non-4K aligned writes.

You can probably figure out pretty quickly which is the case by stracing 
ffmpeg-mux.  Since lower dirty ratios made it happen more often for you, 
my guess is the app is sending down unaligned writes.

-chris

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-22 19:08         ` Chris Mason
@ 2018-01-22 21:17           ` Sebastian Ochmann
  2018-01-24 17:52             ` Chris Mason
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Ochmann @ 2018-01-22 21:17 UTC (permalink / raw)
  To: Chris Mason, Qu Wenruo, linux-btrfs, nborisov

Hello,

I attached to the ffmpeg-mux process for a little while and pasted the 
result here:

https://pastebin.com/XHaMLX8z

Can you help me with interpreting this result? If you'd like me to run 
strace with specific options, please let me know. This is a level of 
debugging I'm not dealing with on a daily basis. :)

Best regards
Sebastian


On 22.01.2018 20:08, Chris Mason wrote:
> On 01/22/2018 01:33 PM, Sebastian Ochmann wrote:
> 
> [ skipping to the traces ;) ]
> 
>> 2866 ffmpeg-mux D
>> [<ffffffffc06251b1>] btrfs_start_ordered_extent+0x101/0x130 [btrfs]
>> [<ffffffffc061b0b0>] lock_and_cleanup_extent_if_need+0x340/0x380 [btrfs]
>> [<ffffffffc061c3c1>] __btrfs_buffered_write+0x261/0x740 [btrfs]
>> [<ffffffffc06202bf>] btrfs_file_write_iter+0x20f/0x650 [btrfs]
>> [<ffffffff9b235309>] __vfs_write+0xf9/0x170
>> [<ffffffff9b23551d>] vfs_write+0xad/0x1a0
>> [<ffffffff9b235762>] SyS_write+0x52/0xc0
>> [<ffffffff9b800083>] entry_SYSCALL_64_fastpath+0x1a/0x7d
>> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> This is where we wait for writes that are already in flight before we're 
> allowed to redirty those pages in the file.  It'll happen when we either 
> overwrite a page in the file that we've already written, or when we're 
> trickling down writes slowly in non-4K aligned writes.
> 
> You can probably figure out pretty quickly which is the case by stracing 
> ffmpeg-mux.  Since lower dirty ratios made it happen more often for you, 
> my guess is the app is sending down unaligned writes.
> 
> -chris
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-22  8:59     ` Duncan
@ 2018-01-23  8:38       ` ein
  2018-01-24  1:32         ` Duncan
  0 siblings, 1 reply; 15+ messages in thread
From: ein @ 2018-01-23  8:38 UTC (permalink / raw)
  To: Duncan, linux-btrfs



On 01/22/2018 09:59 AM, Duncan wrote:
> Sebastian Ochmann posted on Sun, 21 Jan 2018 16:27:55 +0100 as excerpted:

> [...]

> On 2018年01月20日 18:47, Sebastian Ochmann wrote:
>>>> Hello,
>>>>
>>>> I would like to describe a real-world use case where btrfs does not
>>>> perform well for me. I'm recording 60 fps, larger-than-1080p video
>>>> using OBS Studio [1] where it is important that the video stream is
>>>> encoded and written out to disk in real-time for a prolonged period of
>>>> time (2-5 hours). The result is a H264 video encoded on the GPU with a
>>>> data rate ranging from approximately 10-50 MB/s.
>>>
>>>> The hardware used is powerful enough to handle this task. When I use a
>>>> XFS volume for recording, no matter whether it's a SSD or HDD, the
>>>> recording is smooth and no frame drops are reported (OBS has a nice
>>>> Stats window where it shows the number of frames dropped due to
>>>> encoding lag which seemingly also includes writing the data out to
>>>> disk).
>>>>
>>>> However, when using a btrfs volume I quickly observe severe, periodic
>>>> frame drops. It's not single frames but larger chunks of frames that a
>>>> dropped at a time. I tried mounting the volume with nobarrier but to
>>>> no avail.
>>> What's the drop internal? Something near 30s?
>>> If so, try mount option commit=300 to see if it helps.
>> [...]
> 64 GB RAM...
>
> Do you know about the /proc/sys/vm/dirty_* files and how to use/tweak 
> them?  If not, read $KERNDIR/Documentation/sysctl/vm.txt, focusing on 
> these files.
>
> These tunables control the amount of writeback cache that is allowed to 
> accumulate before the system starts flushing it.  The problem is that the 
> defaults for these tunables were selected back when system memory 
> normally measured in the MiB, not the GiB of today, so the default ratios 
> allow too much dirty data to accumulate before attempting to flush it to 
> storage, resulting in flush storms that hog the available IO and starve 
> other tasks that might be trying to use it.
>
> The fix is to tweak these settings to try to smooth things out, starting 
> background flush earlier, so with a bit of luck the system never hits 
> high priority foreground flush mode, or if it does there's not so much to 
> be written as much of it has already been done in the background.
>
> There are five files, two pairs of files, one pair controlling foreground 
> sizes, the other background, and one file setting the time limit.  The 
> sizes can be set by either ratio, percentage of RAM, or bytes, with the 
> other appearing as zero when read.
>
> To set these temporarily you write to the appropriate file.  Once you 
> have a setting that works well for you, write it to your distro's sysctl 
> configuration (/etc/sysctl.conf or /etc/sysctrl.d/*.conf, usually), and 
> it should be automatically applied at boot for you.
>
> Here's the settings in my /etc/sysctl.conf, complete with notes about the 
> defaults and the values I've chosen for my 16G of RAM.  Note that while I 
> have fast ssds now, I set these values back when I had spinning rust.  I 
> was happy with them then, and while I shouldn't really need the settings 
> on my ssds, I've seen no reason to change them.
>
> At 16G, 1% ~ 160M.  At 64G, it'd be four times larger, 640M, likely too 
> chunky a granularity to be useful, so you'll probably want to set the 
> bytes value instead of ratio.
>
> # write-cache, foreground/background flushing
> # vm.dirty_ratio = 10 (% of RAM)
> # make it 3% of 16G ~ half a gig
> vm.dirty_ratio = 3
> # vm.dirty_bytes = 0
>
> # vm.dirty_background_ratio = 5 (% of RAM)
> # make it 1% of 16G ~ 160 M
> vm.dirty_background_ratio = 1
> # vm.dirty_background_bytes = 0
>
> # vm.dirty_expire_centisecs = 2999 (30 sec)
> # vm.dirty_writeback_centisecs = 499 (5 sec)
> # make it 10 sec
> vm.dirty_writeback_centisecs = 1000
>
>
> Now the other factor in the picture is how fast your actual hardware can 
> write.  hdparm's -t parameter tests sequential write speed and can give 
> you some idea.  You'll need to run it as root:
>
> hdparm -t /dev/sda
>
> /dev/sda:
>  Timing buffered disk reads: 1578 MB in  3.00 seconds = 525.73 MB/sec
>
> ... Like I said, fast ssd...  I believe fast modern spinning rust should 
> be 100 MB/sec or so, tho slower devices may only do 30 MB/sec, likely too 
> slow for your reported 10-50 MB/sec stream, tho you say yours should be 
> fast enough as it's fine with xfs.
>
>
> Now here's the problem.  As Qu mentions elsewhere on-thread, 30 seconds 
> of your 10-50 MB/sec stream is 300-1500 MiB.  Say your available device 
> IO bandwidth is 100 MiB/sec.  That should be fine.  But the default 
> dirty_* settings allow 5% of RAM in dirty writeback cache before even 
> starting low priority background flush, while it won't kick to high 
> priority until 10% of RAM or 30 seconds, whichever comes first.
>
> And at 64 GiB RAM, 1% is as I said, about 640 MiB, so 10% is 6.4 GB dirty 
> before it kicks to high priority, and 3.2 GB is the 5% accumulation 
> before it even starts low priority background writing.  That's assuming 
> the 30 second timeout hasn't expired yet, of course.
>
> But as we established above the write stream maxes out at ~1.5 GiB in 30 
> seconds, and that's well below the ~3.2 GiB @ 64 GiB RAM that would kick 
> in even low priority background writeback!
>
> So at the defaults, the background writeback never kicks in at all, until 
> the 30 second timeout expires, forcing immediate high priority foreground 
> flushing!
>
> Meanwhile, the way the kernel handles /background/ writeback flushing is 
> that it will take the opportunity to writeback what it can while the 
> device is idle.  But as we've just established, background never kicks in.
>
> So then the timeout expires and the kernel kicks in high priority 
> foreground writeback.
>
> And the kernel handles foreground writeback *MUCH* differently!  
> Basically, it stops anything attempting to dirty more writeback cache 
> until it can write the dirty cache out.  And it charges the time it 
> spends doing just that to the thread it stopped in ordered to do that 
> high priority writeback! 
>
> Now as designed this should work well, and it does when the dirty_* 
> values are set correctly, because any process that's trying to dirty the 
> writeback cache faster than it can be written out, thus kicking in 
> foreground mode, gets stopped until the data can be written out, thus 
> preventing it from dirtying even MORE cache faster than the system can 
> handle it, which in /theory/ is what kicked it into high priority 
> foreground mode in the /first/ place.
>
> But as I said, the default ratios were selected when memory was far 
> smaller.  With half a gig of RAM, the default 5% to kick in background 
> mode would be only ~25 MiB, easily writable within a second on modern 
> devices and back then, still writable within say 5-10 seconds.  And if it 
> ever reached foreground mode, that would still be only 50 MiB worth, and 
> it would still complete in well under the 30 seconds before the next 
> expiry.
>
> But with modern RAM levels, my 16 GiB to some extent and your 64 GiB is 
> even worse, as we've seen, even our max ~1500 MiB doesn't kick in 
> background writeback mode, so the stuff just sits there until it expires 
> and then it get slammed into high priority foreground mode, stopping your 
> streaming until it has a chance to write some of that dirty data out.
>
> And at our assumed 100 MiB/sec IO bandwidth, that 300-1500 MiB is going 
> to take 3-15 seconds to write out, well within the 30 seconds before the 
> next expiry, but for a time-critical streaming app, stopping it even the 
> minimal 3 seconds is very likely to drop frames!
>
>
> So try setting something a bit more reasonable and see if it helps.  That 
> 1% ratio at 16 GiB RAM for ~160 MB was fine for me, but I'm not doing 
> critical streaming, and at 64 GiB you're looking at ~640 MB per 1%, as I 
> said, too chunky.  For streaming, I'd suggest something approaching the 
> value of your per-second IO bandwidth, we're assuming 100 MB/sec here so 
> 100 MiB but let's round that up to a nice binary 128 MiB, for the 
> background value, perhaps half a GiB or 5 seconds worth of writeback time 
> for foreground, 4 times the background value.  So:
>
> vm.dirty_background_bytes = 134217728   # 128*1024*1024, 128 MiB
> vm.dirty_bytes = 536870912              # 512*1024*1024, 512 MiB
>
>
> As mentioned, try writing those values directly into /proc/sys/vm/
> dirty_background_bytes and dirty_bytes , first, to see if it helps.  If 
> my guess is correct, that should vastly improve the situation for you.  
> If it does but not quite enough or you just want to try tweaking some 
> more, you can tweak it from there, but those are reasonable starting 
> values and really should work far better than the default 5% and 10% of 
> RAM with 64 GiB of it!
>
>
> Other things to try tweaking include the IO scheduler -- the default is 
> the venerable CFQ but deadline may well be better for a streaming use-
> case, and now there's the new multi-queue stuff and the multi-queue kyber 
> and bfq schedulers, as well -- and setting IO priority -- probably by 
> increasing the IO priority of the streaming app.  The tool to use for the 
> latter is called ionice.  Do note, however, that not all schedulers 
> implement IO priorities.  CFQ does, but while I think deadline should 
> work better for the streaming use-case, it's simpler code and I don't 
> believe it implements IO priority.  Similarly for multi-queue, I'd guess 
> the low-code-designed-for-fast-direct-PCIE-connected-SSD kyber doesn't 
> implement IO priorities, while the more complex and general purpose 
> suitable-for-spinning-rust bfq /might/ implement IO priorities.
>
> But I know less about that stuff and it's googlable, should you decide to 
> try playing with it too.  I know what the dirty_* stuff does from 
> personal experience. =:^)
>
>
> And to tie up a loose end, xfs has somewhat different design principles 
> and may well not be particularly sensitive to the dirty_* settings, while 
> btrfs, due to COW and other design choices, is likely more sensitive to 
> them than the widely used ext* and reiserfs (my old choice and the basis 
> of my own settings, above).

Excellent booklike writeup showing how /proc/sys/vm/ works, but I
wonder, how can you explain why does XFS work in this case?

> -- 
> PGP Public Key (RSA/4096b):
> ID: 0xF2C6EA10
> SHA-1: 51DA 40EE 832A 0572 5AD8 B3C0 7AFF 69E1 F2C6 EA10


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-23  8:38       ` ein
@ 2018-01-24  1:32         ` Duncan
  0 siblings, 0 replies; 15+ messages in thread
From: Duncan @ 2018-01-24  1:32 UTC (permalink / raw)
  To: linux-btrfs

ein posted on Tue, 23 Jan 2018 09:38:13 +0100 as excerpted:

> On 01/22/2018 09:59 AM, Duncan wrote:
>> 
>> And to tie up a loose end, xfs has somewhat different design principles
>> and may well not be particularly sensitive to the dirty_* settings,
>> while btrfs, due to COW and other design choices, is likely more
>> sensitive to them than the widely used ext* and reiserfs (my old choice
>> and the basis of my own settings, above).

> Excellent booklike writeup showing how /proc/sys/vm/ works, but I
> wonder, how can you explain why does XFS work in this case?

I can't, directly, which is why I glossed over it so fast above.  I do 
have some "educated guesswork", but that's _all_ it is, as I've not had 
reason to get particularly familiar with xfs and its quirks.  You'd have 
to ask the xfs folks if my _guess_ is anything approaching reality, but 
if you do please be clear that I explicitly said I don't know and that 
this is simply my best guess based on the very limited exposure to xfs 
discussions I've had.

So I'm not experience-familiar with xfs and other than what I've happened 
across in cross-list threads here, know little about it except that it 
was ported to Linux from other *ix.  I understand the xfs port to 
"native" is far more complete than that of zfs, for example.  
Additionally, I know from various vfs discussion threads cross-posted to 
this and other filesystem lists that xfs remains rather different than 
some -- apparently (if I've gotten it right) it handles "objects" rather 
than inodes and extents, for instance.

Apparently, if the vfs threads I've read are to be believed, xfs would 
have some trouble with a proposed vfs interface that would allow requests 
to write out and free N pages or N KiB of dirty RAM from the write 
buffers in ordered to clear memory for other usage, because it tracks 
objects rather than dirty pages/KiB of RAM.  Sure it could do it, but it 
wouldn't be an efficient enough operation to be worth the trouble for 
xfs.  So apparently xfs just won't make use of that feature of the 
proposed new vfs API, there's nothing that says it /has/ to, after all -- 
it's proposed to be optional, not mandatory.

Now that discussion was in a somewhat different context than the 
vm.dirty_* settings discussion here, but it seems reasonable to assume 
that if xfs would have trouble converting objects to the size of the 
memory they take in the one case, the /proc/sys/vm/dirty_* dirty writeback 
cache tweaking features may not apply to xfs, at least in a direct/
intuitive way, either.

Which is why I suggested xfs might not be particularly sensitive to those 
settings -- I don't know that it ignores them entirely, and it may use 
them in /some/ way, possibly indirectly, but the evidence I've seen does 
suggest that xfs may, if it uses those settings at all, not be as 
sensitive to them as btrfs/reiserfs/ext*.

Meanwhile, due to the extra work btrfs does with checksumming and cow, 
while AFAIK it uses the settings "straight", having them out of whack 
likely has a stronger effect on btrfs than it does on ext* and reiserfs 
(with reiserfs likely being slightly more strongly affected than ext*, 
but not to the level of btrfs).

And there has indeed been confirmation on-list that adjusting these 
settings *does* have a very favorable effect on btrfs for /some/ use-
cases.

(In one particular case, the posting was to the main LKML, but on btrfs 
IIRC, and Linus got involved.  I don't believe that lead to the 
/creation/ of the relatively new per-device throttling stuff as I believe 
the patches were already around, but I suspect it may have lead to their 
integration in mainline a few kernel cycles earlier than they may have 
been otherwise.  Because it's a reasonably well known "secret" that the 
default ratios are out of whack on modern systems, it's just not settled 
what the new defaults /should/ be, so in the absence of agreement or 
pressing problem, they remain as they are.  But Linus blew his top as 
he's known to do, he and others pointed the reporter at the vm.dirty_* 
settings tho Linus wanted to know why the defaults were so insane for 
today's machines, and tweaking those did indeed help.  Then a kernel 
cycle or two later the throttling options appeared in mainline, very 
possibly as a result of Linus "routing around the problem" to some 
extent.)

So in my head I have a picture of the possible continuum of vm.dirty_ 
effect that looks like this:

<- weak                         effect                        strong ->

zfs....xfs.....................ext*....reiserfs.................btrfs

zfs, no or almost no effect, because it uses non-native mechanism and is 
poorly adapted to Linux.

xfs, possibly some effect, but likely relatively light, because its 
mechanisms aren't completely adapted to Linux-vfs-native either, and if 
it uses those settings at all it may well be via some indirect 
translation mechanism.

ext* pretty near the center, because it has been the assumed default for 
so long, and because much of the vfs stuff was ext* first and then later 
moved to the generic vfs layer and exposed to other fs to use too.

reiserfs near center also, but a bit more affected because it's a bit 
more complex, while still similar enough to use the settings directly.

btrfs on the strong effect end, because its implementation is quite 
complex and features such as cow and checksumming increase the work it 
must do, but it still works in size-based units (potentially unlike xfs 
which apparently works with objects of varying sizes), so the effect of 
out of adjustment settings is far stronger.

But as I said I have little experience and know little about zfs/xfs, so 
particularly at that end, it's almost entirely extrapolation from 
admittedly thin evidence and very possibly not entirely correct 
conclusions I've drawn from the evidence I've seen, so if you're on xfs 
(or zfs) or it otherwise strongly matters to you, I'd strongly suggest 
double-checking with the xfs folks.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Periodic frame losses when recording to btrfs volume with OBS
  2018-01-22 21:17           ` Sebastian Ochmann
@ 2018-01-24 17:52             ` Chris Mason
  0 siblings, 0 replies; 15+ messages in thread
From: Chris Mason @ 2018-01-24 17:52 UTC (permalink / raw)
  To: Sebastian Ochmann, Qu Wenruo, linux-btrfs, nborisov

On 01/22/2018 04:17 PM, Sebastian Ochmann wrote:
> Hello,
> 
> I attached to the ffmpeg-mux process for a little while and pasted the 
> result here:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pastebin.com_XHaMLX8z&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=IkofqwZ_S5C0_qAXjt4EQae-mVE09Ir8zmSbuGqXaCs&s=1nw7xUkEoQF7MgYOlZ8iAA9U0UsRQObH1Z4VLqx8IF8&e= 
> 
> 
> Can you help me with interpreting this result? If you'd like me to run 
> strace with specific options, please let me know. This is a level of 
> debugging I'm not dealing with on a daily basis. :)
> 

Going to guess it's these sequences:

lseek(3, 1302012898, SEEK_SET)          = 1302012898
write(3, "\37C\266u\1\377\377\377\377\377\377\377\277\204|\271\347J\347\203\3@\0\243CY\202\0\0\0!\21"..., 262144) = 262144
write(3, "\310\22\323g7J#h\351\0\323\270\f\206\5\207(.\232\246\27\371/\376\341\0\0\200\th\3\37"..., 262144) = 262144
write(3, "\225*\245<8N\32\263\237k\331]\313\215\301\366$\7\216\0349\302AS\201\302\307T\361\365\3375"..., 262144) = 262144
write(3, "\272e\37\255\250\24n\235\341E\272Me\36'\345W\353\2337K.n\367\264\\\370\307\341_\206|"..., 262144) = 262144
write(3, "iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii"..., 53271) = 53271
lseek(3, 1302012902, SEEK_SET)          = 1302012902
write(3, "\1\0\0\0\0\20\320\v", 8)      = 8
lseek(3, 1303114745, SEEK_SET)          = 1303114745

It's seeking, writing, then jumping back and updating what had been written 
before.

That's going to hit the stable page writing code in btrfs that I had mentioned earlier.

At Facebook, we've been experimenting with fixes for this that are limited to O_APPEND 
slowly growing log files.  Let me think harder...

-chris

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-01-24 17:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-20 10:47 Periodic frame losses when recording to btrfs volume with OBS Sebastian Ochmann
2018-01-21 10:04 ` Qu Wenruo
2018-01-21 15:27   ` Sebastian Ochmann
2018-01-21 22:05     ` Chris Murphy
     [not found]     ` <CAJCQCtQOTNZZnkiw2Tq9Mgwnc4pykbOjCb2DCOm4iCjn5K9jQw@mail.gmail.com>
2018-01-21 22:33       ` Sebastian Ochmann
2018-01-22  0:39     ` Qu Wenruo
2018-01-22  9:19       ` Nikolay Borisov
2018-01-22 18:33       ` Sebastian Ochmann
2018-01-22 19:08         ` Chris Mason
2018-01-22 21:17           ` Sebastian Ochmann
2018-01-24 17:52             ` Chris Mason
2018-01-22  8:59     ` Duncan
2018-01-23  8:38       ` ein
2018-01-24  1:32         ` Duncan
2018-01-22 14:27 ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).