Cephfs losing files and corrupting others

All of lore.kernel.org
 help / color / mirror / Atom feed

* Cephfs losing files and corrupting others
@ 2012-11-01 16:22 Nathan Howell
  2012-11-01 22:32 ` Sam Lang
  0 siblings, 1 reply; 10+ messages in thread
From: Nathan Howell @ 2012-11-01 16:22 UTC (permalink / raw)
  To: ceph-devel

We have a small (3 node) Ceph cluster that occasionally has issues. It
loses files and directories, truncates them or fills the contents with
NULL bytes. So far we haven't been able to build a repro case but it
seems to happen when bulk loading data into the cluster, a process
that is run each evening by a cron job. We've gone about a month
without any issues but had it happen again yesterday during a larger
bulk load.  The data is backed up outside of ceph and can be reloaded
but finding the corrupt files takes quite a while.

Has anyone heard of similar issues before? Should I try upgrading to
0.48.2 or a newer kernel?

ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux _ 3.4.4-gentoo #2 SMP Sun Jul 1 18:28:16 UTC 2012 x86_64
Intel(R) Xeon(R) CPU E31240 @ 3.30GHz GenuineIntel GNU/Linux

I'm using the kernel provided cephfs, mounted with these options:
10.0.2.2:6789:/ on /ceph type ceph (rw,noatime,nodiratime)

thanks,
-n

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cephfs losing files and corrupting others
  2012-11-01 16:22 Cephfs losing files and corrupting others Nathan Howell
@ 2012-11-01 22:32 ` Sam Lang
  2012-11-01 23:02   ` Gregory Farnum
  2012-11-01 23:30   ` Nathan Howell
  0 siblings, 2 replies; 10+ messages in thread
From: Sam Lang @ 2012-11-01 22:32 UTC (permalink / raw)
  To: Nathan Howell; +Cc: ceph-devel

On Thu 01 Nov 2012 11:22:59 AM CDT, Nathan Howell wrote:
> We have a small (3 node) Ceph cluster that occasionally has issues. It
> loses files and directories, truncates them or fills the contents with
> NULL bytes. So far we haven't been able to build a repro case but it
> seems to happen when bulk loading data into the cluster, a process
> that is run each evening by a cron job. We've gone about a month
> without any issues but had it happen again yesterday during a larger
> bulk load.  The data is backed up outside of ceph and can be reloaded
> but finding the corrupt files takes quite a while.
>
> Has anyone heard of similar issues before? Should I try upgrading to
> 0.48.2 or a newer kernel?

Hi Nathan,

Do the writes succeed?  I.e. the programs creating the files don't get 
errors back?  Are you seeing any problems with the ceph mds or osd 
processes crashing?  Can you describe your I/O workload during these 
bulk loads?  How many files, how much data, multiple clients writing, 
etc.

As far as I know, there haven't been any fixes to 0.48.2 to resolve 
problems like yours.  You might try the ceph fuse client to see if you 
get the same behavior.  If not, then at least we have narrowed down the 
problem to the ceph kernel client.

Thanks,
-sam

>
> ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
> Linux _ 3.4.4-gentoo #2 SMP Sun Jul 1 18:28:16 UTC 2012 x86_64
> Intel(R) Xeon(R) CPU E31240 @ 3.30GHz GenuineIntel GNU/Linux
>
> I'm using the kernel provided cephfs, mounted with these options:
> 10.0.2.2:6789:/ on /ceph type ceph (rw,noatime,nodiratime)
>
> thanks,
> -n
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cephfs losing files and corrupting others
  2012-11-01 22:32 ` Sam Lang
@ 2012-11-01 23:02   ` Gregory Farnum
  2012-11-01 23:30   ` Nathan Howell
  1 sibling, 0 replies; 10+ messages in thread
From: Gregory Farnum @ 2012-11-01 23:02 UTC (permalink / raw)
  To: Nathan Howell, Sam Lang; +Cc: ceph-devel

On Thu, Nov 1, 2012 at 11:32 PM, Sam Lang <sam.lang@inktank.com> wrote:
> On Thu 01 Nov 2012 11:22:59 AM CDT, Nathan Howell wrote:
>>
>> We have a small (3 node) Ceph cluster that occasionally has issues. It
>> loses files and directories, truncates them or fills the contents with
>> NULL bytes. So far we haven't been able to build a repro case but it
>> seems to happen when bulk loading data into the cluster, a process
>> that is run each evening by a cron job. We've gone about a month
>> without any issues but had it happen again yesterday during a larger
>> bulk load.  The data is backed up outside of ceph and can be reloaded
>> but finding the corrupt files takes quite a while.
>>
>> Has anyone heard of similar issues before? Should I try upgrading to
>> 0.48.2 or a newer kernel?
>
>
> Hi Nathan,
>
> Do the writes succeed?  I.e. the programs creating the files don't get
> errors back?  Are you seeing any problems with the ceph mds or osd processes
> crashing?  Can you describe your I/O workload during these bulk loads?  How
> many files, how much data, multiple clients writing, etc.
>
> As far as I know, there haven't been any fixes to 0.48.2 to resolve problems
> like yours.  You might try the ceph fuse client to see if you get the same
> behavior.  If not, then at least we have narrowed down the problem to the
> ceph kernel client.

Are you using hard links, by any chance? Do you have one or many MDS
systems? What filesystem are you using on your OSDs?
-Greg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cephfs losing files and corrupting others
  2012-11-01 22:32 ` Sam Lang
  2012-11-01 23:02   ` Gregory Farnum
@ 2012-11-01 23:30   ` Nathan Howell
  2012-11-02  2:37     ` Yan, Zheng 
  2012-11-03 16:54     ` Gregory Farnum
  1 sibling, 2 replies; 10+ messages in thread
From: Nathan Howell @ 2012-11-01 23:30 UTC (permalink / raw)
  To: Sam Lang, Gregory Farnum; +Cc: ceph-devel

On Thu, Nov 1, 2012 at 3:32 PM, Sam Lang <sam.lang@inktank.com> wrote:
> Do the writes succeed?  I.e. the programs creating the files don't get
> errors back?  Are you seeing any problems with the ceph mds or osd processes
> crashing?  Can you describe your I/O workload during these bulk loads?  How
> many files, how much data, multiple clients writing, etc.
>
> As far as I know, there haven't been any fixes to 0.48.2 to resolve problems
> like yours.  You might try the ceph fuse client to see if you get the same
> behavior.  If not, then at least we have narrowed down the problem to the
> ceph kernel client.

Yes, the writes succeed. Wednesday's failure looked like this:

1) rsync 100-200mb tarball directly into ceph from a remote site
2) untar ~500 files from tarball in ceph into a new directory in ceph
3) wait for a while
4) the .tar file and some log files disappeared but the untarred files were fine

Total filesystem size is:

pgmap v2221244: 960 pgs: 960 active+clean; 2418 GB data, 7293 GB used,
6151 GB / 13972 GB avail

Generally our load looks like:

Constant trickle of 1-2mb files from 3 machines, about 1GB per day
total. No file is written to by more than 1 machine, but the files go
into shared directories.

Grid jobs are running constantly and are doing sequential reads from
the filesystem. Compute nodes have the filesystem mounted read-only.
They're primarily located at a remote site (~40ms away) and tend to
average 1-2 megabits/sec.

Nightly data jobs load in ~10GB from a few remote sites in to <10
large files. These are split up into about 1000 smaller files but the
originals are also kept. All of this is done on one machine. The
journals and osd drives are write saturated while this is going on.

On Thu, Nov 1, 2012 at 4:02 PM, Gregory Farnum <greg@inktank.com> wrote:
> Are you using hard links, by any chance?

No, we are using a handfull of soft links though.

> Do you have one or many MDS systems?

ceph mds stat says: e686: 1/1/1 up {0=xxx=up:active}, 2 up:standby

> What filesystem are you using on your OSDs?

btrfs

thanks,
-n

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cephfs losing files and corrupting others
  2012-11-01 23:30   ` Nathan Howell
@ 2012-11-02  2:37     ` Yan, Zheng 
  2012-11-03 16:54     ` Gregory Farnum
  1 sibling, 0 replies; 10+ messages in thread
From: Yan, Zheng  @ 2012-11-02  2:37 UTC (permalink / raw)
  To: Nathan Howell; +Cc: Sam Lang, Gregory Farnum, ceph-devel

On Fri, Nov 2, 2012 at 7:30 AM, Nathan Howell <nathan.d.howell@gmail.com> wrote:
> On Thu, Nov 1, 2012 at 3:32 PM, Sam Lang <sam.lang@inktank.com> wrote:
>> Do the writes succeed?  I.e. the programs creating the files don't get
>> errors back?  Are you seeing any problems with the ceph mds or osd processes
>> crashing?  Can you describe your I/O workload during these bulk loads?  How
>> many files, how much data, multiple clients writing, etc.
>>
>> As far as I know, there haven't been any fixes to 0.48.2 to resolve problems
>> like yours.  You might try the ceph fuse client to see if you get the same
>> behavior.  If not, then at least we have narrowed down the problem to the
>> ceph kernel client.
>
> Yes, the writes succeed. Wednesday's failure looked like this:
>
> 1) rsync 100-200mb tarball directly into ceph from a remote site
> 2) untar ~500 files from tarball in ceph into a new directory in ceph
> 3) wait for a while
> 4) the .tar file and some log files disappeared but the untarred files were fine
>
> Total filesystem size is:
>
> pgmap v2221244: 960 pgs: 960 active+clean; 2418 GB data, 7293 GB used,
> 6151 GB / 13972 GB avail
>
> Generally our load looks like:
>
> Constant trickle of 1-2mb files from 3 machines, about 1GB per day
> total. No file is written to by more than 1 machine, but the files go
> into shared directories.
>
> Grid jobs are running constantly and are doing sequential reads from
> the filesystem. Compute nodes have the filesystem mounted read-only.
> They're primarily located at a remote site (~40ms away) and tend to
> average 1-2 megabits/sec.
>
> Nightly data jobs load in ~10GB from a few remote sites in to <10
> large files. These are split up into about 1000 smaller files but the
> originals are also kept. All of this is done on one machine. The
> journals and osd drives are write saturated while this is going on.
>
>
> On Thu, Nov 1, 2012 at 4:02 PM, Gregory Farnum <greg@inktank.com> wrote:
>> Are you using hard links, by any chance?
>
> No, we are using a handfull of soft links though.
>
>
>> Do you have one or many MDS systems?
>
> ceph mds stat says: e686: 1/1/1 up {0=xxx=up:active}, 2 up:standby
>
>
>> What filesystem are you using on your OSDs?
>
> btrfs
>
>

my recent patch ''ceph: Fix i_size update race" probably can fix the
truncated file issue.

Yan, Zheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cephfs losing files and corrupting others
  2012-11-01 23:30   ` Nathan Howell
  2012-11-02  2:37     ` Yan, Zheng 
@ 2012-11-03 16:54     ` Gregory Farnum
       [not found]       ` <CAD84eiEDMXiXf8aFojpAFJPt=5DVZNFbnNq9BnJBxMzRrdNjrw@mail.gmail.com>
  1 sibling, 1 reply; 10+ messages in thread
From: Gregory Farnum @ 2012-11-03 16:54 UTC (permalink / raw)
  To: Nathan Howell, Samuel Just; +Cc: Sam Lang, ceph-devel

On Fri, Nov 2, 2012 at 12:30 AM, Nathan Howell
<nathan.d.howell@gmail.com> wrote:
> On Thu, Nov 1, 2012 at 3:32 PM, Sam Lang <sam.lang@inktank.com> wrote:
>> Do the writes succeed?  I.e. the programs creating the files don't get
>> errors back?  Are you seeing any problems with the ceph mds or osd processes
>> crashing?  Can you describe your I/O workload during these bulk loads?  How
>> many files, how much data, multiple clients writing, etc.
>>
>> As far as I know, there haven't been any fixes to 0.48.2 to resolve problems
>> like yours.  You might try the ceph fuse client to see if you get the same
>> behavior.  If not, then at least we have narrowed down the problem to the
>> ceph kernel client.
>
> Yes, the writes succeed. Wednesday's failure looked like this:
>
> 1) rsync 100-200mb tarball directly into ceph from a remote site
> 2) untar ~500 files from tarball in ceph into a new directory in ceph
> 3) wait for a while
> 4) the .tar file and some log files disappeared but the untarred files were fine

Just to be clear, you copied a tarball into Ceph and untarred all in
Ceph, and the extracted contents were fine but the tarball
disappeared? So this looks like a case of successfully-written files
disappearing?
Did you at any point check the tarball from a machine other than the
initial client that copied it in?

This truncation sounds like maybe Yan's fix will deal with it. But if
you've also seen files with the proper size but be empty or corrupted,
that sounds like an OSD bug. Sam, are you aware of any btrfs issues
that could cause this?

Nathan, you've also seen parts of the filesystem hierarchy get lost?
That's rather more concerning; under what circumstances have you seen
that?
-Greg

> Total filesystem size is:
>
> pgmap v2221244: 960 pgs: 960 active+clean; 2418 GB data, 7293 GB used,
> 6151 GB / 13972 GB avail
>
> Generally our load looks like:
>
> Constant trickle of 1-2mb files from 3 machines, about 1GB per day
> total. No file is written to by more than 1 machine, but the files go
> into shared directories.
>
> Grid jobs are running constantly and are doing sequential reads from
> the filesystem. Compute nodes have the filesystem mounted read-only.
> They're primarily located at a remote site (~40ms away) and tend to
> average 1-2 megabits/sec.
>
> Nightly data jobs load in ~10GB from a few remote sites in to <10
> large files. These are split up into about 1000 smaller files but the
> originals are also kept. All of this is done on one machine. The
> journals and osd drives are write saturated while this is going on.
>
>
> On Thu, Nov 1, 2012 at 4:02 PM, Gregory Farnum <greg@inktank.com> wrote:
>> Are you using hard links, by any chance?
>
> No, we are using a handfull of soft links though.
>
>
>> Do you have one or many MDS systems?
>
> ceph mds stat says: e686: 1/1/1 up {0=xxx=up:active}, 2 up:standby
>
>
>> What filesystem are you using on your OSDs?
>
> btrfs
>
>
> thanks,
> -n

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cephfs losing files and corrupting others
       [not found]       ` <CAD84eiEDMXiXf8aFojpAFJPt=5DVZNFbnNq9BnJBxMzRrdNjrw@mail.gmail.com>
@ 2012-11-23  7:37         ` Nathan Howell
  2012-11-25 20:45           ` Nathan Howell
  0 siblings, 1 reply; 10+ messages in thread
From: Nathan Howell @ 2012-11-23  7:37 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Samuel Just, Sam Lang, ceph-devel

I upgraded to 0.54 and now there are some hints in the logs. The
directories referenced in the log entries are now missing:

2012-11-23 07:28:04.802864 mds.0 [ERR] loaded dup inode 1000000662f
[2,head] v3851654 at /xxx/20120203, but inode 1000000662f.head
v3853093 already exists at ~mds0/stray7/1000000662f
2012-11-23 07:28:04.802889 mds.0 [ERR] loaded dup inode 10000003a4b
[2,head] v431518 at /xxx/20120206, but inode 10000003a4b.head v3853192
already exists at ~mds0/stray8/10000003a4b
2012-11-23 07:28:04.802909 mds.0 [ERR] loaded dup inode 1000000149e
[2,head] v431522 at /xxx/20120207, but inode 1000000149e.head v3853206
already exists at ~mds0/stray8/1000000149e
2012-11-23 07:28:04.802927 mds.0 [ERR] loaded dup inode 10000000a5f
[2,head] v431526 at /xxx/20120208, but inode 10000000a5f.head v3853208
already exists at ~mds0/stray8/10000000a5f

Any ideas?

On Thu, Nov 15, 2012 at 11:00 AM, Nathan Howell
<nathan.d.howell@gmail.com> wrote:
> Yes, successfully written files were disappearing. We switched to ceph-fuse
> and haven't seen any files truncated since. Older files (written months ago)
> are still having their entire contents replaced with NULL bytes, seemly at
> random. I can't yet say for sure this has happened since switching over to
> fuse... but we think it has.
>
> I'm going to test all of the archives over the next few days and restore
> them from S3, so we should be back in a known-good state after that. In the
> event more files end up corrupted, is there any logging that I can enable
> that would help track down the problem?
>
> thanks,
> -n
>
>
> On Sat, Nov 3, 2012 at 9:54 AM, Gregory Farnum <greg@inktank.com> wrote:
>>
>> On Fri, Nov 2, 2012 at 12:30 AM, Nathan Howell
>> <nathan.d.howell@gmail.com> wrote:
>> > On Thu, Nov 1, 2012 at 3:32 PM, Sam Lang <sam.lang@inktank.com> wrote:
>> >> Do the writes succeed?  I.e. the programs creating the files don't get
>> >> errors back?  Are you seeing any problems with the ceph mds or osd
>> >> processes
>> >> crashing?  Can you describe your I/O workload during these bulk loads?
>> >> How
>> >> many files, how much data, multiple clients writing, etc.
>> >>
>> >> As far as I know, there haven't been any fixes to 0.48.2 to resolve
>> >> problems
>> >> like yours.  You might try the ceph fuse client to see if you get the
>> >> same
>> >> behavior.  If not, then at least we have narrowed down the problem to
>> >> the
>> >> ceph kernel client.
>> >
>> > Yes, the writes succeed. Wednesday's failure looked like this:
>> >
>> > 1) rsync 100-200mb tarball directly into ceph from a remote site
>> > 2) untar ~500 files from tarball in ceph into a new directory in ceph
>> > 3) wait for a while
>> > 4) the .tar file and some log files disappeared but the untarred files
>> > were fine
>>
>> Just to be clear, you copied a tarball into Ceph and untarred all in
>> Ceph, and the extracted contents were fine but the tarball
>> disappeared? So this looks like a case of successfully-written files
>> disappearing?
>> Did you at any point check the tarball from a machine other than the
>> initial client that copied it in?
>>
>> This truncation sounds like maybe Yan's fix will deal with it. But if
>> you've also seen files with the proper size but be empty or corrupted,
>> that sounds like an OSD bug. Sam, are you aware of any btrfs issues
>> that could cause this?
>>
>> Nathan, you've also seen parts of the filesystem hierarchy get lost?
>> That's rather more concerning; under what circumstances have you seen
>> that?
>> -Greg
>>
>> > Total filesystem size is:
>> >
>> > pgmap v2221244: 960 pgs: 960 active+clean; 2418 GB data, 7293 GB used,
>> > 6151 GB / 13972 GB avail
>> >
>> > Generally our load looks like:
>> >
>> > Constant trickle of 1-2mb files from 3 machines, about 1GB per day
>> > total. No file is written to by more than 1 machine, but the files go
>> > into shared directories.
>> >
>> > Grid jobs are running constantly and are doing sequential reads from
>> > the filesystem. Compute nodes have the filesystem mounted read-only.
>> > They're primarily located at a remote site (~40ms away) and tend to
>> > average 1-2 megabits/sec.
>> >
>> > Nightly data jobs load in ~10GB from a few remote sites in to <10
>> > large files. These are split up into about 1000 smaller files but the
>> > originals are also kept. All of this is done on one machine. The
>> > journals and osd drives are write saturated while this is going on.
>> >
>> >
>> > On Thu, Nov 1, 2012 at 4:02 PM, Gregory Farnum <greg@inktank.com> wrote:
>> >> Are you using hard links, by any chance?
>> >
>> > No, we are using a handfull of soft links though.
>> >
>> >
>> >> Do you have one or many MDS systems?
>> >
>> > ceph mds stat says: e686: 1/1/1 up {0=xxx=up:active}, 2 up:standby
>> >
>> >
>> >> What filesystem are you using on your OSDs?
>> >
>> > btrfs
>> >
>> >
>> > thanks,
>> > -n
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cephfs losing files and corrupting others
  2012-11-23  7:37         ` Nathan Howell
@ 2012-11-25 20:45           ` Nathan Howell
  2012-12-04 21:57             ` Gregory Farnum
  0 siblings, 1 reply; 10+ messages in thread
From: Nathan Howell @ 2012-11-25 20:45 UTC (permalink / raw)
  To: Gregory Farnum, ceph-devel; +Cc: Samuel Just, Sam Lang

So when trawling through the filesystem doing checksum validation
these popped up on the files that are filled with null bytes:
https://gist.github.com/186ad4c5df816d44f909

Is there any way to fsck today? Looks like feature #86
http://tracker.newdream.net/issues/86 isn't implemented yet.

thanks,
-n

On Thu, Nov 22, 2012 at 11:37 PM, Nathan Howell
<nathan.d.howell@gmail.com> wrote:
> I upgraded to 0.54 and now there are some hints in the logs. The
> directories referenced in the log entries are now missing:
>
> 2012-11-23 07:28:04.802864 mds.0 [ERR] loaded dup inode 1000000662f
> [2,head] v3851654 at /xxx/20120203, but inode 1000000662f.head
> v3853093 already exists at ~mds0/stray7/1000000662f
> 2012-11-23 07:28:04.802889 mds.0 [ERR] loaded dup inode 10000003a4b
> [2,head] v431518 at /xxx/20120206, but inode 10000003a4b.head v3853192
> already exists at ~mds0/stray8/10000003a4b
> 2012-11-23 07:28:04.802909 mds.0 [ERR] loaded dup inode 1000000149e
> [2,head] v431522 at /xxx/20120207, but inode 1000000149e.head v3853206
> already exists at ~mds0/stray8/1000000149e
> 2012-11-23 07:28:04.802927 mds.0 [ERR] loaded dup inode 10000000a5f
> [2,head] v431526 at /xxx/20120208, but inode 10000000a5f.head v3853208
> already exists at ~mds0/stray8/10000000a5f
>
> Any ideas?
>
> On Thu, Nov 15, 2012 at 11:00 AM, Nathan Howell
> <nathan.d.howell@gmail.com> wrote:
>> Yes, successfully written files were disappearing. We switched to ceph-fuse
>> and haven't seen any files truncated since. Older files (written months ago)
>> are still having their entire contents replaced with NULL bytes, seemly at
>> random. I can't yet say for sure this has happened since switching over to
>> fuse... but we think it has.
>>
>> I'm going to test all of the archives over the next few days and restore
>> them from S3, so we should be back in a known-good state after that. In the
>> event more files end up corrupted, is there any logging that I can enable
>> that would help track down the problem?
>>
>> thanks,
>> -n
>>
>>
>> On Sat, Nov 3, 2012 at 9:54 AM, Gregory Farnum <greg@inktank.com> wrote:
>>>
>>> On Fri, Nov 2, 2012 at 12:30 AM, Nathan Howell
>>> <nathan.d.howell@gmail.com> wrote:
>>> > On Thu, Nov 1, 2012 at 3:32 PM, Sam Lang <sam.lang@inktank.com> wrote:
>>> >> Do the writes succeed?  I.e. the programs creating the files don't get
>>> >> errors back?  Are you seeing any problems with the ceph mds or osd
>>> >> processes
>>> >> crashing?  Can you describe your I/O workload during these bulk loads?
>>> >> How
>>> >> many files, how much data, multiple clients writing, etc.
>>> >>
>>> >> As far as I know, there haven't been any fixes to 0.48.2 to resolve
>>> >> problems
>>> >> like yours.  You might try the ceph fuse client to see if you get the
>>> >> same
>>> >> behavior.  If not, then at least we have narrowed down the problem to
>>> >> the
>>> >> ceph kernel client.
>>> >
>>> > Yes, the writes succeed. Wednesday's failure looked like this:
>>> >
>>> > 1) rsync 100-200mb tarball directly into ceph from a remote site
>>> > 2) untar ~500 files from tarball in ceph into a new directory in ceph
>>> > 3) wait for a while
>>> > 4) the .tar file and some log files disappeared but the untarred files
>>> > were fine
>>>
>>> Just to be clear, you copied a tarball into Ceph and untarred all in
>>> Ceph, and the extracted contents were fine but the tarball
>>> disappeared? So this looks like a case of successfully-written files
>>> disappearing?
>>> Did you at any point check the tarball from a machine other than the
>>> initial client that copied it in?
>>>
>>> This truncation sounds like maybe Yan's fix will deal with it. But if
>>> you've also seen files with the proper size but be empty or corrupted,
>>> that sounds like an OSD bug. Sam, are you aware of any btrfs issues
>>> that could cause this?
>>>
>>> Nathan, you've also seen parts of the filesystem hierarchy get lost?
>>> That's rather more concerning; under what circumstances have you seen
>>> that?
>>> -Greg
>>>
>>> > Total filesystem size is:
>>> >
>>> > pgmap v2221244: 960 pgs: 960 active+clean; 2418 GB data, 7293 GB used,
>>> > 6151 GB / 13972 GB avail
>>> >
>>> > Generally our load looks like:
>>> >
>>> > Constant trickle of 1-2mb files from 3 machines, about 1GB per day
>>> > total. No file is written to by more than 1 machine, but the files go
>>> > into shared directories.
>>> >
>>> > Grid jobs are running constantly and are doing sequential reads from
>>> > the filesystem. Compute nodes have the filesystem mounted read-only.
>>> > They're primarily located at a remote site (~40ms away) and tend to
>>> > average 1-2 megabits/sec.
>>> >
>>> > Nightly data jobs load in ~10GB from a few remote sites in to <10
>>> > large files. These are split up into about 1000 smaller files but the
>>> > originals are also kept. All of this is done on one machine. The
>>> > journals and osd drives are write saturated while this is going on.
>>> >
>>> >
>>> > On Thu, Nov 1, 2012 at 4:02 PM, Gregory Farnum <greg@inktank.com> wrote:
>>> >> Are you using hard links, by any chance?
>>> >
>>> > No, we are using a handfull of soft links though.
>>> >
>>> >
>>> >> Do you have one or many MDS systems?
>>> >
>>> > ceph mds stat says: e686: 1/1/1 up {0=xxx=up:active}, 2 up:standby
>>> >
>>> >
>>> >> What filesystem are you using on your OSDs?
>>> >
>>> > btrfs
>>> >
>>> >
>>> > thanks,
>>> > -n
>>
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cephfs losing files and corrupting others
  2012-11-25 20:45           ` Nathan Howell
@ 2012-12-04 21:57             ` Gregory Farnum
  2012-12-05  1:23               ` Gregory Farnum
  0 siblings, 1 reply; 10+ messages in thread
From: Gregory Farnum @ 2012-12-04 21:57 UTC (permalink / raw)
  To: Nathan Howell; +Cc: ceph-devel@vger.kernel.org, Samuel Just, Sam Lang

On Sun, Nov 25, 2012 at 12:45 PM, Nathan Howell
<nathan.d.howell@gmail.com> wrote:
> So when trawling through the filesystem doing checksum validation
> these popped up on the files that are filled with null bytes:
> https://gist.github.com/186ad4c5df816d44f909
>
> Is there any way to fsck today? Looks like feature #86
> http://tracker.newdream.net/issues/86 isn't implemented yet.

Yeah, unfortunately there isn't — fsck is one of those things that we
want to do as we prepare CephFS for production use, but we're only now
starting to move back in that direction.

The error printouts you're seeing indicate that...actually, I don't
know what they mean in this context. Hrm. In any case, Zheng Yan
contributed some patches that could impact a number of these issues,
but I still don't see how the NULL bytes could enter into it from our
end. If you can afford the disk space required to turn on "debug osd =
10" on the OSDs, and "debug mds = 10" on the MDS, that might give us a
clue about what's going on, if we manage to grab the logs that overlap
with the bad event (or at least the detection of it). You'll certainly
want to enable log rotation, though — that will generate some very
large logs.

Sorry for the slow turnaround time on this, our attention is being
pulled in a lot of directions besides CephFS and this is going to be a
hard one.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cephfs losing files and corrupting others
  2012-12-04 21:57             ` Gregory Farnum
@ 2012-12-05  1:23               ` Gregory Farnum
  0 siblings, 0 replies; 10+ messages in thread
From: Gregory Farnum @ 2012-12-05  1:23 UTC (permalink / raw)
  To: Nathan Howell; +Cc: ceph-devel@vger.kernel.org, Samuel Just, Sam Lang

On Tue, Dec 4, 2012 at 1:57 PM, Gregory Farnum <greg@inktank.com> wrote:
> On Sun, Nov 25, 2012 at 12:45 PM, Nathan Howell
> <nathan.d.howell@gmail.com> wrote:
>> So when trawling through the filesystem doing checksum validation
>> these popped up on the files that are filled with null bytes:
>> https://gist.github.com/186ad4c5df816d44f909
>>
>> Is there any way to fsck today? Looks like feature #86
>> http://tracker.newdream.net/issues/86 isn't implemented yet.
>
> Yeah, unfortunately there isn't — fsck is one of those things that we
> want to do as we prepare CephFS for production use, but we're only now
> starting to move back in that direction.
>
> The error printouts you're seeing indicate that...actually, I don't
> know what they mean in this context. Hrm. In any case, Zheng Yan
> contributed some patches that could impact a number of these issues,
> but I still don't see how the NULL bytes could enter into it from our
> end.

Oooh, actually, Zheng's patches are definitely related to this issue.
If you can try the "next" branch, that might resolve it going forward
(it won't repair current damage, though).
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-12-05  1:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-01 16:22 Cephfs losing files and corrupting others Nathan Howell
2012-11-01 22:32 ` Sam Lang
2012-11-01 23:02   ` Gregory Farnum
2012-11-01 23:30   ` Nathan Howell
2012-11-02  2:37     ` Yan, Zheng 
2012-11-03 16:54     ` Gregory Farnum
     [not found]       ` <CAD84eiEDMXiXf8aFojpAFJPt=5DVZNFbnNq9BnJBxMzRrdNjrw@mail.gmail.com>
2012-11-23  7:37         ` Nathan Howell
2012-11-25 20:45           ` Nathan Howell
2012-12-04 21:57             ` Gregory Farnum
2012-12-05  1:23               ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.