* No space left on device, problem
@ 2013-10-26 19:46 Igor M
2013-10-26 21:00 ` Igor M
2013-10-26 21:35 ` Chris Murphy
0 siblings, 2 replies; 20+ messages in thread
From: Igor M @ 2013-10-26 19:46 UTC (permalink / raw)
To: linux-btrfs
Hello,
I just upgraded kernel to 3.11.6 added new disk and created btrfs:
# mkfs.btrfs /dev/sdb
# mount -t btrfs -o compress=lzo,compress-force=lzo /dev/sdb
/usr/local/mysql/data
I started copying files from old disk and then I got 'No space left on
device', but there is a lot of space.
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 2.8T 110G 2.7T 4% /usr/local/mysql/data
# btrfs fi show
Label: none uuid: c0bfcb22-8b7c-4936-afcd-7acdf58f1d6c
Total devices 1 FS bytes used 108.68GB
devid 1 size 2.73TB used 113.04GB path /dev/sdb
# btrfs fi df /usr/local/mysql/data
Data: total=111.01GB, used=108.25GB
System, DUP: total=8.00MB, used=20.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=441.91MB
Metadata: total=8.00MB, used=0.00
I tried balance from FAQ:
# btrfs fi balance start -dusage=5 /usr/local/mysql/data
Done, had to relocate 4 out of 117 chunks
But it doesn't help. When I try to copy file there is still 'No space
left on device'.
What to do ?
Regards,
Igor
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-26 19:46 Igor M
@ 2013-10-26 21:00 ` Igor M
2013-10-26 21:35 ` Chris Murphy
1 sibling, 0 replies; 20+ messages in thread
From: Igor M @ 2013-10-26 21:00 UTC (permalink / raw)
To: linux-btrfs
Some more info, exact error message is:
cp: writing ‘/usr/local/mysql/data/gbdata/parts_0015.MYI’: No space
left on device
cp: failed to extend ‘/usr/local/mysql/data/gbdata/parts_0015.MYI’: No
space left on device
Files are 2.7G - 7.7G big.
On Sat, Oct 26, 2013 at 9:46 PM, Igor M <igork20@gmail.com> wrote:
> Hello,
>
> I just upgraded kernel to 3.11.6 added new disk and created btrfs:
>
> # mkfs.btrfs /dev/sdb
> # mount -t btrfs -o compress=lzo,compress-force=lzo /dev/sdb
> /usr/local/mysql/data
>
> I started copying files from old disk and then I got 'No space left on
> device', but there is a lot of space.
>
> # df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sdb 2.8T 110G 2.7T 4% /usr/local/mysql/data
>
> # btrfs fi show
> Label: none uuid: c0bfcb22-8b7c-4936-afcd-7acdf58f1d6c
> Total devices 1 FS bytes used 108.68GB
> devid 1 size 2.73TB used 113.04GB path /dev/sdb
>
> # btrfs fi df /usr/local/mysql/data
> Data: total=111.01GB, used=108.25GB
> System, DUP: total=8.00MB, used=20.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=1.00GB, used=441.91MB
> Metadata: total=8.00MB, used=0.00
>
> I tried balance from FAQ:
>
> # btrfs fi balance start -dusage=5 /usr/local/mysql/data
> Done, had to relocate 4 out of 117 chunks
>
>
> But it doesn't help. When I try to copy file there is still 'No space
> left on device'.
>
> What to do ?
>
>
> Regards,
>
> Igor
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-26 19:46 Igor M
2013-10-26 21:00 ` Igor M
@ 2013-10-26 21:35 ` Chris Murphy
2013-10-26 21:53 ` Igor M
1 sibling, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2013-10-26 21:35 UTC (permalink / raw)
To: Igor M; +Cc: linux-btrfs
On Oct 26, 2013, at 1:46 PM, Igor M <igork20@gmail.com> wrote:
>
> # mount -t btrfs -o compress=lzo,compress-force=lzo /
Why do you have two compression mount options? You need to pick one of these.
> What to do ?
Are there any kernel messages reported by dmesg at the time the copy starts and fails? What's the exact copy command you're using?
Chris Murphy
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-26 21:35 ` Chris Murphy
@ 2013-10-26 21:53 ` Igor M
2013-10-26 22:17 ` Chris Murphy
0 siblings, 1 reply; 20+ messages in thread
From: Igor M @ 2013-10-26 21:53 UTC (permalink / raw)
To: linux-btrfs
On Sat, Oct 26, 2013 at 11:35 PM, Chris Murphy <lists@colorremedies.com> wrote:
>
> On Oct 26, 2013, at 1:46 PM, Igor M <igork20@gmail.com> wrote:
>>
>> # mount -t btrfs -o compress=lzo,compress-force=lzo /
>
> Why do you have two compression mount options? You need to pick one of these.
I removed one. I was thinking both were needed.
>
>> What to do ?
>
> Are there any kernel messages reported by dmesg at the time the copy starts and fails? What's the exact copy command you're using?
No messages. Just whem mounting:
device fsid c0bfcb22-8b7c-4936-afcd-
7acdf58f1d6c devid 1 transid 622 /dev/sdb
btrfs: force lzo compression
btrfs: disk space caching is enabled
I even added enospc_debug mount option, still no messages.
I'm using simple cp command:
# cp -a /mnt/old_hd/data/gbdata/* /usr/local/mysql/data/gbdata/
or for one file
# cp -a /mnt/old_hd/data/gbdata/parts_0016.MYD /usr/local/mysql/data/gbdata/
cp: writing ‘/usr/local/mysql/data/gbdata/parts_0016.MYD’: No space
left on device
cp: failed to extend ‘/usr/local/mysql/data/gbdata/parts_0016.MYD’: No
space left on device
It's the same error if I try to copy for ex. with midnight commander.
On Sat, Oct 26, 2013 at 11:35 PM, Chris Murphy <lists@colorremedies.com> wrote:
>
> On Oct 26, 2013, at 1:46 PM, Igor M <igork20@gmail.com> wrote:
>>
>> # mount -t btrfs -o compress=lzo,compress-force=lzo /
>
> Why do you have two compression mount options? You need to pick one of these.
>
>> What to do ?
>
> Are there any kernel messages reported by dmesg at the time the copy starts and fails? What's the exact copy command you're using?
>
>
> Chris Murphy
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-26 21:53 ` Igor M
@ 2013-10-26 22:17 ` Chris Murphy
2013-10-26 22:22 ` Igor M
0 siblings, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2013-10-26 22:17 UTC (permalink / raw)
To: Igor M; +Cc: linux-btrfs
On Oct 26, 2013, at 3:53 PM, Igor M <igork20@gmail.com> wrote:
>
> I even added enospc_debug mount option, still no messages.
If it were kernel enospc, you should have messages in dmesg.
What version of btrfs progs when making the btrfs volume?
>
> cp: failed to extend ‘/usr/local/mysql/data/gbdata/parts_0016.MYD’: No
> space left on device
Reboot with kernel parameter ignore_loglevel and retry the copy, and see if you now have anything in dmesg at the time of the copy.
>
> It's the same error if I try to copy for ex. with midnight commander.
I have the same kernel version, the same mount options, and use the same cp -a on a 5.3GB file and cannot reproduce your results.
Chris Murphy
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-26 22:17 ` Chris Murphy
@ 2013-10-26 22:22 ` Igor M
2013-10-26 22:50 ` Igor M
0 siblings, 1 reply; 20+ messages in thread
From: Igor M @ 2013-10-26 22:22 UTC (permalink / raw)
To: Chris Murphy; +Cc: linux-btrfs
On Sun, Oct 27, 2013 at 12:17 AM, Chris Murphy <lists@colorremedies.com> wrote:
>
> On Oct 26, 2013, at 3:53 PM, Igor M <igork20@gmail.com> wrote:
>
>>
>> I even added enospc_debug mount option, still no messages.
>
> If it were kernel enospc, you should have messages in dmesg.
>
> What version of btrfs progs when making the btrfs volume?
>
>>
>> cp: failed to extend ‘/usr/local/mysql/data/gbdata/parts_0016.MYD’: No
>> space left on device
>
> Reboot with kernel parameter ignore_loglevel and retry the copy, and see if you now have anything in dmesg at the time of the copy.
>
>
Some more info. This files are MySQL database files. Tables contains
only text so they compress well.
But, if I copy some uncompressable file, for example some video than
everything is ok, no error message, copy is successfull.
I also tried mounting without compression (without compress-force=lzo)
and in this case no error is reported.
So it seems it's something with compression ?
I'll try rebooting with ignore_loglevel parameter.
>>
>> It's the same error if I try to copy for ex. with midnight commander.
>
> I have the same kernel version, the same mount options, and use the same cp -a on a 5.3GB file and cannot reproduce your results.
>
>
> Chris Murphy
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-26 22:22 ` Igor M
@ 2013-10-26 22:50 ` Igor M
2013-10-26 22:54 ` Igor M
0 siblings, 1 reply; 20+ messages in thread
From: Igor M @ 2013-10-26 22:50 UTC (permalink / raw)
To: Chris Murphy; +Cc: linux-btrfs
On Sun, Oct 27, 2013 at 12:22 AM, Igor M <igork20@gmail.com> wrote:
> On Sun, Oct 27, 2013 at 12:17 AM, Chris Murphy <lists@colorremedies.com> wrote:
>>
>> On Oct 26, 2013, at 3:53 PM, Igor M <igork20@gmail.com> wrote:
>>
>>>
>>> I even added enospc_debug mount option, still no messages.
>>
>> If it were kernel enospc, you should have messages in dmesg.
>>
>> What version of btrfs progs when making the btrfs volume?
>>
>>>
>>> cp: failed to extend ‘/usr/local/mysql/data/gbdata/parts_0016.MYD’: No
>>> space left on device
>>
>> Reboot with kernel parameter ignore_loglevel and retry the copy, and see if you now have anything in dmesg at the time of the copy.
>>
>>
>
> Some more info. This files are MySQL database files. Tables contains
> only text so they compress well.
> But, if I copy some uncompressable file, for example some video than
> everything is ok, no error message, copy is successfull.
> I also tried mounting without compression (without compress-force=lzo)
> and in this case no error is reported.
> So it seems it's something with compression ?
>
> I'll try rebooting with ignore_loglevel parameter.
>
>>>
>>> It's the same error if I try to copy for ex. with midnight commander.
>>
>> I have the same kernel version, the same mount options, and use the same cp -a on a 5.3GB file and cannot reproduce your results.
>>
>>
>> Chris Murphy
Still no messages. Parameter seems to be active as
/sys/module/printk/parameters/ignore_loglevel is Y, but there are no
messages in log files or dmesg. Maybe I need to turn on some kernel
debugging option and recompile kernel ?
Also I should mention that cca 230G+ data was copied before this error
started to occur.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-26 22:50 ` Igor M
@ 2013-10-26 22:54 ` Igor M
0 siblings, 0 replies; 20+ messages in thread
From: Igor M @ 2013-10-26 22:54 UTC (permalink / raw)
To: Chris Murphy; +Cc: linux-btrfs
Didn't see before. btrfs progs were compiled today form git.
# btrfs version
Btrfs v0.20-rc1-358-g194aa4a
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
@ 2013-10-27 1:00 Tomasz Chmielewski
2013-10-27 8:50 ` Igor M
0 siblings, 1 reply; 20+ messages in thread
From: Tomasz Chmielewski @ 2013-10-27 1:00 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org, igork20
> Still no messages. Parameter seems to be active as
> /sys/module/printk/parameters/ignore_loglevel is Y, but there are no
> messages in log files or dmesg. Maybe I need to turn on some kernel
> debugging option and recompile kernel ?
> Also I should mention that cca 230G+ data was copied before this error
> started to occur.
I think I saw a similar issue before.
Can you try using rsync with "--bwlimit XY" option to copy the files?
The option will limit the speed, in kB, at which the file is being
copied; it will work even when source and destination files are on a
local machine.
--
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-27 1:00 No space left on device, problem Tomasz Chmielewski
@ 2013-10-27 8:50 ` Igor M
2013-10-27 8:56 ` Brendan Hide
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Igor M @ 2013-10-27 8:50 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: linux-btrfs@vger.kernel.org
On Sun, Oct 27, 2013 at 2:00 AM, Tomasz Chmielewski <tch@virtall.com> wrote:
>> Still no messages. Parameter seems to be active as
>> /sys/module/printk/parameters/ignore_loglevel is Y, but there are no
>> messages in log files or dmesg. Maybe I need to turn on some kernel
>> debugging option and recompile kernel ?
>> Also I should mention that cca 230G+ data was copied before this error
>> started to occur.
>
> I think I saw a similar issue before.
>
> Can you try using rsync with "--bwlimit XY" option to copy the files?
>
> The option will limit the speed, in kB, at which the file is being
> copied; it will work even when source and destination files are on a
> local machine.
>
Also I run strace cp -a ..
...
read(3, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
write(4, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
read(3, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = 65536
write(4, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = -1 ENOSPC (No
space left on device)
Last two write calls take a lot more time, and then last one returns
ENOSPC. But if this write is retryed, then it succeeds.
I tried with midnight commander and when error occurs, if I Retry
operation then it finishes copying this file until error occurs again
at next file.
With --bwlimit it seems to be better, lower the speed later the error
occurs, and if it's slow enough copy is successfull.
But now I'm not sure anymore. I copied a few files with bwlimit, and
now sudenly error doesn't occur anymore, even with no bwlimit.
I'll do some more tests.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-27 8:50 ` Igor M
@ 2013-10-27 8:56 ` Brendan Hide
2013-10-27 9:18 ` Igor M
2013-10-27 21:53 ` Igor M
2013-10-28 19:05 ` Josef Bacik
2 siblings, 1 reply; 20+ messages in thread
From: Brendan Hide @ 2013-10-27 8:56 UTC (permalink / raw)
To: Igor M, Tomasz Chmielewski; +Cc: linux-btrfs@vger.kernel.org
On 2013/10/27 10:50 AM, Igor M wrote:
> On Sun, Oct 27, 2013 at 2:00 AM, Tomasz Chmielewski <tch@virtall.com> wrote:
>>> Still no messages. Parameter seems to be active as
>>> /sys/module/printk/parameters/ignore_loglevel is Y, but there are no
>>> messages in log files or dmesg. Maybe I need to turn on some kernel
>>> debugging option and recompile kernel ?
>>> Also I should mention that cca 230G+ data was copied before this error
>>> started to occur.
>> I think I saw a similar issue before.
>>
>> Can you try using rsync with "--bwlimit XY" option to copy the files?
>>
>> The option will limit the speed, in kB, at which the file is being
>> copied; it will work even when source and destination files are on a
>> local machine.
>>
> Also I run strace cp -a ..
> ...
> read(3, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
> write(4, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
> read(3, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = 65536
> write(4, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = -1 ENOSPC (No
> space left on device)
>
> Last two write calls take a lot more time, and then last one returns
> ENOSPC. But if this write is retryed, then it succeeds.
> I tried with midnight commander and when error occurs, if I Retry
> operation then it finishes copying this file until error occurs again
> at next file.
>
> With --bwlimit it seems to be better, lower the speed later the error
> occurs, and if it's slow enough copy is successfull.
> But now I'm not sure anymore. I copied a few files with bwlimit, and
> now sudenly error doesn't occur anymore, even with no bwlimit.
> I'll do some more tests.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
This sounds to me like the problem is related to read performance
causing a bork. This would explain why bwlimit helps, as well as why cp
works the second time around (since it is cached).
--
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-27 8:56 ` Brendan Hide
@ 2013-10-27 9:18 ` Igor M
0 siblings, 0 replies; 20+ messages in thread
From: Igor M @ 2013-10-27 9:18 UTC (permalink / raw)
To: Brendan Hide; +Cc: Tomasz Chmielewski, linux-btrfs@vger.kernel.org
On Sun, Oct 27, 2013 at 9:56 AM, Brendan Hide <brendan@swiftspirit.co.za> wrote:
> On 2013/10/27 10:50 AM, Igor M wrote:
>>
>> On Sun, Oct 27, 2013 at 2:00 AM, Tomasz Chmielewski <tch@virtall.com>
>> wrote:
>>>>
>>>> Still no messages. Parameter seems to be active as
>>>> /sys/module/printk/parameters/ignore_loglevel is Y, but there are no
>>>> messages in log files or dmesg. Maybe I need to turn on some kernel
>>>> debugging option and recompile kernel ?
>>>> Also I should mention that cca 230G+ data was copied before this error
>>>> started to occur.
>>>
>>> I think I saw a similar issue before.
>>>
>>> Can you try using rsync with "--bwlimit XY" option to copy the files?
>>>
>>> The option will limit the speed, in kB, at which the file is being
>>> copied; it will work even when source and destination files are on a
>>> local machine.
>>>
>> Also I run strace cp -a ..
>> ...
>> read(3, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
>> write(4, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
>> read(3, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = 65536
>> write(4, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = -1 ENOSPC (No
>> space left on device)
>>
>> Last two write calls take a lot more time, and then last one returns
>> ENOSPC. But if this write is retryed, then it succeeds.
>> I tried with midnight commander and when error occurs, if I Retry
>> operation then it finishes copying this file until error occurs again
>> at next file.
>>
>> With --bwlimit it seems to be better, lower the speed later the error
>> occurs, and if it's slow enough copy is successfull.
>> But now I'm not sure anymore. I copied a few files with bwlimit, and
>> now sudenly error doesn't occur anymore, even with no bwlimit.
>> I'll do some more tests.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> This sounds to me like the problem is related to read performance causing a
> bork. This would explain why bwlimit helps, as well as why cp works the
> second time around (since it is cached).
>
cp doesn't work second time. cp always fails. If last write() call is
retryed it works, I think this midnight commander do, if you choose
'retry'. It doesn't copy from begining it continues where it left.
I also tried from different disk, same result.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-27 8:50 ` Igor M
2013-10-27 8:56 ` Brendan Hide
@ 2013-10-27 21:53 ` Igor M
2013-10-27 22:46 ` Chris Murphy
2013-10-28 19:05 ` Josef Bacik
2 siblings, 1 reply; 20+ messages in thread
From: Igor M @ 2013-10-27 21:53 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: linux-btrfs@vger.kernel.org
I made some more tests. Disk is 3TB, first cca 225GB is copied without errors.
Then errors 'No space left on device' begins.
Now if I use rsync with '--bwlimit' option no error occurs or if I
choose 'Retry' in Midnight Commander then continues
and after a while another error occurs and again 'Retry' and so on.
I also noticed something else. Just before this error occurs, write()
call takes a lot longer, I also see that progress stops.
If I do 'btrfs fi df ..' at this moment:
Data: total=114.01GB, used=112.00GB
System, DUP: total=8.00MB, used=20.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=452.93MB
Metadata: total=8.00MB, used=0.00
then error is reported, again 'btrfs fi df ..'
Data: total=114.01GB, used=113.00GB
System, DUP: total=8.00MB, used=20.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=456.98MB
Metadata: total=8.00MB, used=0.00
and then 'Retry' and it goes on, until another error.
Always before error, copying stops and used Data and Metadata changes.
Maybe it's something with allocating metadata. I don't know.
This goes on for cca 25G-30G and from now on no errors anymore.
After this 1.3TB was copied without errors. But some of data was on
rather slow disk, so maybe that's why no more errors.
On Sun, Oct 27, 2013 at 9:50 AM, Igor M <igork20@gmail.com> wrote:
> On Sun, Oct 27, 2013 at 2:00 AM, Tomasz Chmielewski <tch@virtall.com> wrote:
>>> Still no messages. Parameter seems to be active as
>>> /sys/module/printk/parameters/ignore_loglevel is Y, but there are no
>>> messages in log files or dmesg. Maybe I need to turn on some kernel
>>> debugging option and recompile kernel ?
>>> Also I should mention that cca 230G+ data was copied before this error
>>> started to occur.
>>
>> I think I saw a similar issue before.
>>
>> Can you try using rsync with "--bwlimit XY" option to copy the files?
>>
>> The option will limit the speed, in kB, at which the file is being
>> copied; it will work even when source and destination files are on a
>> local machine.
>>
>
> Also I run strace cp -a ..
> ...
> read(3, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
> write(4, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
> read(3, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = 65536
> write(4, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = -1 ENOSPC (No
> space left on device)
>
> Last two write calls take a lot more time, and then last one returns
> ENOSPC. But if this write is retryed, then it succeeds.
> I tried with midnight commander and when error occurs, if I Retry
> operation then it finishes copying this file until error occurs again
> at next file.
>
> With --bwlimit it seems to be better, lower the speed later the error
> occurs, and if it's slow enough copy is successfull.
> But now I'm not sure anymore. I copied a few files with bwlimit, and
> now sudenly error doesn't occur anymore, even with no bwlimit.
> I'll do some more tests.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-27 21:53 ` Igor M
@ 2013-10-27 22:46 ` Chris Murphy
2013-10-27 23:27 ` Chris Murphy
0 siblings, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2013-10-27 22:46 UTC (permalink / raw)
To: Btrfs BTRFS
On Oct 27, 2013, at 3:53 PM, Igor M <igork20@gmail.com> wrote:
> I made some more tests. Disk is 3TB, first cca 225GB is copied without errors.
> Then errors 'No space left on device' begins.
Post the full entire dmesg somewhere please. pastebin.com is one option.
Chris Murphy
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-27 22:46 ` Chris Murphy
@ 2013-10-27 23:27 ` Chris Murphy
2013-10-28 7:40 ` Igor M
0 siblings, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2013-10-27 23:27 UTC (permalink / raw)
To: Btrfs BTRFS
On Oct 27, 2013, at 4:46 PM, Chris Murphy <lists@colorremedies.com> wrote:
>
> On Oct 27, 2013, at 3:53 PM, Igor M <igork20@gmail.com> wrote:
>
>> I made some more tests. Disk is 3TB, first cca 225GB is copied without errors.
>> Then errors 'No space left on device' begins.
>
> Post the full entire dmesg somewhere please. pastebin.com is one option.
And on list or pasted, the output for the disk from:
smartctl -x /dev/sdX
Chris Murphy
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-27 23:27 ` Chris Murphy
@ 2013-10-28 7:40 ` Igor M
2013-10-28 17:57 ` Chris Murphy
0 siblings, 1 reply; 20+ messages in thread
From: Igor M @ 2013-10-28 7:40 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On Mon, Oct 28, 2013 at 12:27 AM, Chris Murphy <lists@colorremedies.com> wrote:
>
> On Oct 27, 2013, at 4:46 PM, Chris Murphy <lists@colorremedies.com> wrote:
>
>>
>> On Oct 27, 2013, at 3:53 PM, Igor M <igork20@gmail.com> wrote:
>>
>>> I made some more tests. Disk is 3TB, first cca 225GB is copied without errors.
>>> Then errors 'No space left on device' begins.
>>
>> Post the full entire dmesg somewhere please. pastebin.com is one option.
>
>
> And on list or pasted, the output for the disk from:
>
> smartctl -x /dev/sdX
>
dmesg: http://pastebin.com/t2H1QYye
source disk: http://pastebin.com/JqKxkxKr
dest disk: http://pastebin.com/ez9jALS2
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-28 7:40 ` Igor M
@ 2013-10-28 17:57 ` Chris Murphy
2013-10-28 19:07 ` Igor M
0 siblings, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2013-10-28 17:57 UTC (permalink / raw)
To: Btrfs BTRFS
On Oct 28, 2013, at 1:40 AM, Igor M <igork20@gmail.com> wrote:
>
> dmesg: http://pastebin.com/t2H1QYye
You've got a warning related to pcie bridge on boot, with a trace that follows. I don't know if this could be related to some problems.
[ 0.325976] ------------[ cut here ]------------
[ 0.326086] WARNING: CPU: 5 PID: 1 at drivers/pci/search.c:46 pci_find_upstream_pcie_bridge+0x50/0x70()
[ 0.326263] Modules linked in:
[ 0.326412] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 3.11.6 #1
[ 0.326520] Hardware name: System manufacturer System Product Name/P8H77-V LE, BIOS 0601 06/06/2012
[ 0.326695] 0000000000000000 0000000000000009 ffffffff81449f86 0000000000000000
[ 0.327032] ffffffff810514b1 ffff88040b031898 ffff88040b031800 ffff88040b031898
[ 0.327367] 0000000080000000 000077ff80000000 ffffffff812191a0 0000000000000004
[ 0.327704] Call Trace:
[ 0.327819] [<ffffffff81449f86>] ? dump_stack+0x41/0x51
[ 0.327931] [<ffffffff810514b1>] ? warn_slowpath_common+0x81/0xb0
[ 0.328040] [<ffffffff812191a0>] ? pci_find_upstream_pcie_bridge+0x50/0x70
[ 0.328151] [<ffffffff813333d3>] ? intel_iommu_add_device+0x43/0x210
[ 0.328261] [<ffffffff81330510>] ? bus_set_iommu+0x60/0x60
[ 0.328370] [<ffffffff8133053c>] ? add_iommu_group+0x2c/0x60
[ 0.328481] [<ffffffff81292a3d>] ? bus_for_each_dev+0x4d/0x80
[ 0.328591] [<ffffffff813304fa>] ? bus_set_iommu+0x4a/0x60
[ 0.328701] [<ffffffff8163bc0d>] ? intel_iommu_init+0xb20/0xc45
[ 0.328812] [<ffffffff81612919>] ? unpack_to_rootfs+0x24b/0x25b
[ 0.328922] [<ffffffff816163c6>] ? pci_iommu_init+0xe/0x37
[ 0.329031] [<ffffffff816163b8>] ? memblock_find_dma_reserve+0x148/0x148
[ 0.329142] [<ffffffff810002f2>] ? do_one_initcall+0x102/0x150
[ 0.329252] [<ffffffff81611e43>] ? kernel_init_freeable+0xfd/0x18e
[ 0.329362] [<ffffffff816117cf>] ? do_early_param+0x83/0x83
[ 0.329471] [<ffffffff81444480>] ? rest_init+0x70/0x70
[ 0.329579] [<ffffffff81444489>] ? kernel_init+0x9/0xe0
[ 0.329688] [<ffffffff8144f36c>] ? ret_from_fork+0x7c/0xb0
[ 0.329797] [<ffffffff81444480>] ? rest_init+0x70/0x70
[ 0.329907] ---[ end trace 0946f959337cff8b ]---
There are also numerous ACPI errors.
ACPI Error: [DSSP] Namespace lookup failure
check this:
http://forums.gentoo.org/viewtopic-t-960476-start-0.html
Anyway, I don't see any read or write failures for any of the drives which is what I was kinda expecting.
>
> dest disk: http://pastebin.com/ez9jALS2
This is a new drive with only 71 power on hours yet I'm seeing this:
• 0x0009 2 27 Transition from drive PhyRdy to drive PhyNRdy
• 0x000a 2 27 Device-to-host register FISes sent due to a COMRESET
That's unexpected but I don't know that it's releated. The dmesg doesn't report any phy issues with the drive. Maybe check syslog or journalctl with a case insensitive search for phy and see if you find anything.
Chris Murphy
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-27 8:50 ` Igor M
2013-10-27 8:56 ` Brendan Hide
2013-10-27 21:53 ` Igor M
@ 2013-10-28 19:05 ` Josef Bacik
2013-10-28 19:15 ` Igor M
2 siblings, 1 reply; 20+ messages in thread
From: Josef Bacik @ 2013-10-28 19:05 UTC (permalink / raw)
To: Igor M; +Cc: Tomasz Chmielewski, linux-btrfs@vger.kernel.org
On Sun, Oct 27, 2013 at 09:50:37AM +0100, Igor M wrote:
> On Sun, Oct 27, 2013 at 2:00 AM, Tomasz Chmielewski <tch@virtall.com> wrote:
> >> Still no messages. Parameter seems to be active as
> >> /sys/module/printk/parameters/ignore_loglevel is Y, but there are no
> >> messages in log files or dmesg. Maybe I need to turn on some kernel
> >> debugging option and recompile kernel ?
> >> Also I should mention that cca 230G+ data was copied before this error
> >> started to occur.
> >
> > I think I saw a similar issue before.
> >
> > Can you try using rsync with "--bwlimit XY" option to copy the files?
> >
> > The option will limit the speed, in kB, at which the file is being
> > copied; it will work even when source and destination files are on a
> > local machine.
> >
>
> Also I run strace cp -a ..
> ...
> read(3, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
> write(4, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
> read(3, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = 65536
> write(4, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = -1 ENOSPC (No
> space left on device)
>
> Last two write calls take a lot more time, and then last one returns
> ENOSPC. But if this write is retryed, then it succeeds.
> I tried with midnight commander and when error occurs, if I Retry
> operation then it finishes copying this file until error occurs again
> at next file.
>
> With --bwlimit it seems to be better, lower the speed later the error
> occurs, and if it's slow enough copy is successfull.
> But now I'm not sure anymore. I copied a few files with bwlimit, and
> now sudenly error doesn't occur anymore, even with no bwlimit.
> I'll do some more tests.
I just sent a patch to the list
[PATCH] Btrfs: make sure the delalloc workers actually flush compressed writes
Can you run this patch and see if it makes a difference for your test? Thanks,
Josef
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-28 17:57 ` Chris Murphy
@ 2013-10-28 19:07 ` Igor M
0 siblings, 0 replies; 20+ messages in thread
From: Igor M @ 2013-10-28 19:07 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On Mon, Oct 28, 2013 at 6:57 PM, Chris Murphy <lists@colorremedies.com> wrote:
>
> On Oct 28, 2013, at 1:40 AM, Igor M <igork20@gmail.com> wrote:
>>
>> dmesg: http://pastebin.com/t2H1QYye
>
>
> You've got a warning related to pcie bridge on boot, with a trace that follows. I don't know if this could be related to some problems.
>
> [ 0.325976] ------------[ cut here ]------------
> [ 0.326086] WARNING: CPU: 5 PID: 1 at drivers/pci/search.c:46 pci_find_upstream_pcie_bridge+0x50/0x70()
> [ 0.326263] Modules linked in:
> [ 0.326412] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 3.11.6 #1
> [ 0.326520] Hardware name: System manufacturer System Product Name/P8H77-V LE, BIOS 0601 06/06/2012
> [ 0.326695] 0000000000000000 0000000000000009 ffffffff81449f86 0000000000000000
> [ 0.327032] ffffffff810514b1 ffff88040b031898 ffff88040b031800 ffff88040b031898
> [ 0.327367] 0000000080000000 000077ff80000000 ffffffff812191a0 0000000000000004
> [ 0.327704] Call Trace:
> [ 0.327819] [<ffffffff81449f86>] ? dump_stack+0x41/0x51
> [ 0.327931] [<ffffffff810514b1>] ? warn_slowpath_common+0x81/0xb0
> [ 0.328040] [<ffffffff812191a0>] ? pci_find_upstream_pcie_bridge+0x50/0x70
> [ 0.328151] [<ffffffff813333d3>] ? intel_iommu_add_device+0x43/0x210
> [ 0.328261] [<ffffffff81330510>] ? bus_set_iommu+0x60/0x60
> [ 0.328370] [<ffffffff8133053c>] ? add_iommu_group+0x2c/0x60
> [ 0.328481] [<ffffffff81292a3d>] ? bus_for_each_dev+0x4d/0x80
> [ 0.328591] [<ffffffff813304fa>] ? bus_set_iommu+0x4a/0x60
> [ 0.328701] [<ffffffff8163bc0d>] ? intel_iommu_init+0xb20/0xc45
> [ 0.328812] [<ffffffff81612919>] ? unpack_to_rootfs+0x24b/0x25b
> [ 0.328922] [<ffffffff816163c6>] ? pci_iommu_init+0xe/0x37
> [ 0.329031] [<ffffffff816163b8>] ? memblock_find_dma_reserve+0x148/0x148
> [ 0.329142] [<ffffffff810002f2>] ? do_one_initcall+0x102/0x150
> [ 0.329252] [<ffffffff81611e43>] ? kernel_init_freeable+0xfd/0x18e
> [ 0.329362] [<ffffffff816117cf>] ? do_early_param+0x83/0x83
> [ 0.329471] [<ffffffff81444480>] ? rest_init+0x70/0x70
> [ 0.329579] [<ffffffff81444489>] ? kernel_init+0x9/0xe0
> [ 0.329688] [<ffffffff8144f36c>] ? ret_from_fork+0x7c/0xb0
> [ 0.329797] [<ffffffff81444480>] ? rest_init+0x70/0x70
> [ 0.329907] ---[ end trace 0946f959337cff8b ]---
>
>
> There are also numerous ACPI errors.
>
> ACPI Error: [DSSP] Namespace lookup failure
>
> check this:
> http://forums.gentoo.org/viewtopic-t-960476-start-0.html
>
>
> Anyway, I don't see any read or write failures for any of the drives which is what I was kinda expecting.
>
>
>>
>> dest disk: http://pastebin.com/ez9jALS2
>
> This is a new drive with only 71 power on hours yet I'm seeing this:
>
> • 0x0009 2 27 Transition from drive PhyRdy to drive PhyNRdy
> • 0x000a 2 27 Device-to-host register FISes sent due to a COMRESET
>
>
>
> That's unexpected but I don't know that it's releated. The dmesg doesn't report any phy issues with the drive. Maybe check syslog or journalctl with a case insensitive search for phy and see if you find anything.
>
>
>
>
> Chris Murphy
>
Drive should be ok. About pcie_bridge warning, I'm not sure how to
solve, otherwise this computer is stable.
There is no error if compression is turned off or non-compressible
file is copied.
I'll try on different computer and see if the same will happen.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: No space left on device, problem
2013-10-28 19:05 ` Josef Bacik
@ 2013-10-28 19:15 ` Igor M
0 siblings, 0 replies; 20+ messages in thread
From: Igor M @ 2013-10-28 19:15 UTC (permalink / raw)
To: Josef Bacik; +Cc: Tomasz Chmielewski, linux-btrfs@vger.kernel.org
On Mon, Oct 28, 2013 at 8:05 PM, Josef Bacik <jbacik@fusionio.com> wrote:
> On Sun, Oct 27, 2013 at 09:50:37AM +0100, Igor M wrote:
>> On Sun, Oct 27, 2013 at 2:00 AM, Tomasz Chmielewski <tch@virtall.com> wrote:
>> >> Still no messages. Parameter seems to be active as
>> >> /sys/module/printk/parameters/ignore_loglevel is Y, but there are no
>> >> messages in log files or dmesg. Maybe I need to turn on some kernel
>> >> debugging option and recompile kernel ?
>> >> Also I should mention that cca 230G+ data was copied before this error
>> >> started to occur.
>> >
>> > I think I saw a similar issue before.
>> >
>> > Can you try using rsync with "--bwlimit XY" option to copy the files?
>> >
>> > The option will limit the speed, in kB, at which the file is being
>> > copied; it will work even when source and destination files are on a
>> > local machine.
>> >
>>
>> Also I run strace cp -a ..
>> ...
>> read(3, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
>> write(4, "350348f07$0$24520$c3e8da3$fb4835"..., 65536) = 65536
>> read(3, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = 65536
>> write(4, "62.76C52BF412E849CB86D4FF3898B94"..., 65536) = -1 ENOSPC (No
>> space left on device)
>>
>> Last two write calls take a lot more time, and then last one returns
>> ENOSPC. But if this write is retryed, then it succeeds.
>> I tried with midnight commander and when error occurs, if I Retry
>> operation then it finishes copying this file until error occurs again
>> at next file.
>>
>> With --bwlimit it seems to be better, lower the speed later the error
>> occurs, and if it's slow enough copy is successfull.
>> But now I'm not sure anymore. I copied a few files with bwlimit, and
>> now sudenly error doesn't occur anymore, even with no bwlimit.
>> I'll do some more tests.
>
> I just sent a patch to the list
>
> [PATCH] Btrfs: make sure the delalloc workers actually flush compressed writes
>
> Can you run this patch and see if it makes a difference for your test? Thanks,
>
> Josef
I'll try with this patch.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2013-10-28 19:15 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-27 1:00 No space left on device, problem Tomasz Chmielewski
2013-10-27 8:50 ` Igor M
2013-10-27 8:56 ` Brendan Hide
2013-10-27 9:18 ` Igor M
2013-10-27 21:53 ` Igor M
2013-10-27 22:46 ` Chris Murphy
2013-10-27 23:27 ` Chris Murphy
2013-10-28 7:40 ` Igor M
2013-10-28 17:57 ` Chris Murphy
2013-10-28 19:07 ` Igor M
2013-10-28 19:05 ` Josef Bacik
2013-10-28 19:15 ` Igor M
-- strict thread matches above, loose matches on Subject: below --
2013-10-26 19:46 Igor M
2013-10-26 21:00 ` Igor M
2013-10-26 21:35 ` Chris Murphy
2013-10-26 21:53 ` Igor M
2013-10-26 22:17 ` Chris Murphy
2013-10-26 22:22 ` Igor M
2013-10-26 22:50 ` Igor M
2013-10-26 22:54 ` Igor M
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).