linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Debian 3.7.1 BTRFS crash
@ 2013-03-13  1:38 Russell Coker
  2013-03-13  1:56 ` Harald Glatt
  2013-03-13  2:03 ` Eric Sandeen
  0 siblings, 2 replies; 14+ messages in thread
From: Russell Coker @ 2013-03-13  1:38 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

I have a workstation running the Debian packaged 3.7.1 kernel from 24th 
December last year.  After some period of uptime (maybe months) it crashed and 
mounted the root filesystem read-only.  Now when I boot it the root filesystem 
gets mounted read-only.

I have attached the dmesg output from the last boot.

The system has an Intel 120G SSD and apart from 4G of swap and 400M of /boot 
it's all a single encrypted BTRFS filesystem.

Any suggestions on what I should do next?

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

[-- Attachment #2: dmesg.txt.gz --]
[-- Type: application/x-gzip, Size: 17360 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13  1:38 Debian 3.7.1 BTRFS crash Russell Coker
@ 2013-03-13  1:56 ` Harald Glatt
  2013-03-13  2:03 ` Eric Sandeen
  1 sibling, 0 replies; 14+ messages in thread
From: Harald Glatt @ 2013-03-13  1:56 UTC (permalink / raw)
  To: russell; +Cc: linux-btrfs

If you care about the data, create a backup if you haven't already
done so. Then you can try btrfsck, maybe you are in luck!

On Wed, Mar 13, 2013 at 2:38 AM, Russell Coker <russell@coker.com.au> wrote:
> I have a workstation running the Debian packaged 3.7.1 kernel from 24th
> December last year.  After some period of uptime (maybe months) it crashed and
> mounted the root filesystem read-only.  Now when I boot it the root filesystem
> gets mounted read-only.
>
> I have attached the dmesg output from the last boot.
>
> The system has an Intel 120G SSD and apart from 4G of swap and 400M of /boot
> it's all a single encrypted BTRFS filesystem.
>
> Any suggestions on what I should do next?
>
> --
> My Main Blog         http://etbe.coker.com.au/
> My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13  1:38 Debian 3.7.1 BTRFS crash Russell Coker
  2013-03-13  1:56 ` Harald Glatt
@ 2013-03-13  2:03 ` Eric Sandeen
  2013-03-13  5:07   ` Jérôme Poulin
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2013-03-13  2:03 UTC (permalink / raw)
  To: russell; +Cc: linux-btrfs

On 3/12/13 8:38 PM, Russell Coker wrote:
> I have a workstation running the Debian packaged 3.7.1 kernel from 24th 
> December last year.  After some period of uptime (maybe months) it crashed and 
> mounted the root filesystem read-only.  Now when I boot it the root filesystem 
> gets mounted read-only.
> 
> I have attached the dmesg output from the last boot.
> 
> The system has an Intel 120G SSD and apart from 4G of swap and 400M of /boot 
> it's all a single encrypted BTRFS filesystem.
> 
> Any suggestions on what I should do next?

Not offhand, but I took a look at the logs, and maybe this will help the
people who are more guru-like than I am.

First you hit:

[   37.175750] btrfs: corrupt leaf, bad key order: block=70852288512,root=1, slot=8
[   37.176435] btrfs: corrupt leaf, bad key order: block=70852288512,root=1, slot=8

which led to an aborted transaction and an attempt at graceful shutdown:

[   37.176478] WARNING: at /build/buildd-linux_3.7.1-1~experimental.1-amd64-lU7Aeh/linux-3.7.1/fs/btrfs/super.c:246 __btrfs_abort_transaction+0x4c/0xcf [btrfs]()
[   37.176481] btrfs: Transaction aborted
...
[   37.176790] BTRFS error (device dm-0) in __btrfs_free_extent:5143: IO failure
[   37.176791] btrfs is forced readonly
[   37.176793] btrfs: run_one_delayed_ref returned -5


in the end, despite that attempt at graceful exit, you hit:

[   37.937174] kernel BUG at /build/buildd-linux_3.7.1-1~experimental.1-amd64-lU7Aeh/linux-3.7.1/fs/btrfs/transaction.c:1753!

because in btrfs_clean_old_snapshots(), btrfs_drop_snapshot() failed, and

		BUG_ON(ret < 0);

it doesn't handle that well.

I have no idea what btrfsck might do, but it seems like if there is a corrupt
leaf, that might be in order.  I might make a device image first, as well.

-Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13  2:03 ` Eric Sandeen
@ 2013-03-13  5:07   ` Jérôme Poulin
  2013-03-13 10:56     ` Bart Noordervliet
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jérôme Poulin @ 2013-03-13  5:07 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: russell, linux-btrfs

On Tue, Mar 12, 2013 at 10:03 PM, Eric Sandeen <sandeen@redhat.com> wrote:
> [   37.176790] BTRFS error (device dm-0) in __btrfs_free_extent:5143: IO failure
> [   37.176791] btrfs is forced readonly
> [   37.176793] btrfs: run_one_delayed_ref returned -5
>


It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
burnt 1 SSD disk and 2 USB flash drive since I'm using BTRFS, in about
2 months for each. ddrescue'ing the SSD would probably give better
chances of recovery and give BTRFS/btrfsck a chance to write correctly
to the newly copied image.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13  5:07   ` Jérôme Poulin
@ 2013-03-13 10:56     ` Bart Noordervliet
  2013-03-13 11:31       ` Swâmi Petaramesh
  2013-03-13 13:47     ` Eric Sandeen
  2013-03-14  9:48     ` Martin Steigerwald
  2 siblings, 1 reply; 14+ messages in thread
From: Bart Noordervliet @ 2013-03-13 10:56 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: linux-btrfs

On Wed, Mar 13, 2013 at 6:07 AM, Jérôme Poulin <jeromepoulin@gmail.com> wrote:
> It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
> burnt 1 SSD disk and 2 USB flash drive since I'm using BTRFS, in about
> 2 months for each.

USB flash drives are rubbish for any filesystem except FAT32 and then
still only gracefully accept large sequential writes. A few years ago
I thought it would be a good idea to put the root partition of a few
of my small Debian servers on USB flash, so that the harddisks could
spin down at night and I could easily prepare and switch a new
Debian-version. However, each and every USB stick got trashed within a
year, no matter which brand, size or product line and despite
specifically formatting them ext3 without a journal. I now use low-end
but recent series of SSD's and have had no such problems any more. I
don't use btrfs on them as yet, but ext4 even with a journal is doing
just fine.

Regards,

Bart

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13 10:56     ` Bart Noordervliet
@ 2013-03-13 11:31       ` Swâmi Petaramesh
  2013-03-14 19:04         ` Norbert Scheibner
  0 siblings, 1 reply; 14+ messages in thread
From: Swâmi Petaramesh @ 2013-03-13 11:31 UTC (permalink / raw)
  To: Bart Noordervliet; +Cc: Jérôme Poulin, linux-btrfs

Le 13/03/2013 11:56, Bart Noordervliet a écrit :
> USB flash drives are rubbish for any filesystem except FAT32 and then
> still only gracefully accept large sequential writes. A few years ago
> I thought it would be a good idea to put the root partition of a few
> of my small Debian servers on USB flash, so that the harddisks could
> spin down at night and I could easily prepare and switch a new
> Debian-version. However, each and every USB stick got trashed within a
> year
I have an ARM box that runs a little Debian server (typically an
advanced NAS), it uses an USB key as an ext2 root filesystem. Everything
but big storage is there, and it's been up and running 24/7 for 3+ years
without any USB key incident...

The USB key is a cheap 1 GB Verbatim I purchased from the next drugstore ;-)

-- 
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E
Ne cherchez pas : Je ne suis pas sur Facebook.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13  5:07   ` Jérôme Poulin
  2013-03-13 10:56     ` Bart Noordervliet
@ 2013-03-13 13:47     ` Eric Sandeen
  2013-03-13 14:03       ` Russell Coker
  2013-03-14  9:48     ` Martin Steigerwald
  2 siblings, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2013-03-13 13:47 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: russell, linux-btrfs

On 3/13/13 12:07 AM, Jérôme Poulin wrote:
> On Tue, Mar 12, 2013 at 10:03 PM, Eric Sandeen <sandeen@redhat.com> wrote:
>> [   37.176790] BTRFS error (device dm-0) in __btrfs_free_extent:5143: IO failure
>> [   37.176791] btrfs is forced readonly
>> [   37.176793] btrfs: run_one_delayed_ref returned -5
>>
> 
> 
> It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
> burnt 1 SSD disk and 2 USB flash drive since I'm using BTRFS, in about
> 2 months for each. ddrescue'ing the SSD would probably give better
> chances of recovery and give BTRFS/btrfsck a chance to write correctly
> to the newly copied image.

On what do you base that theory?  I suppose it could be, but nothing
in the logs necessarily suggests that.  The "IO failure" is because 
the fs shut down, went readonly, and subsequent IOs got -EIO,
I think.

-Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13 13:47     ` Eric Sandeen
@ 2013-03-13 14:03       ` Russell Coker
  2013-03-13 19:19         ` Chris Mason
  0 siblings, 1 reply; 14+ messages in thread
From: Russell Coker @ 2013-03-13 14:03 UTC (permalink / raw)
  To: linux-btrfs

On Thu, 14 Mar 2013, Eric Sandeen <sandeen@redhat.com> wrote:
> > It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
> > burnt 1 SSD disk and 2 USB flash drive since I'm using BTRFS, in about
> > 2 months for each. ddrescue'ing the SSD would probably give better
> > chances of recovery and give BTRFS/btrfsck a chance to write correctly
> > to the newly copied image.
> 
> On what do you base that theory?  I suppose it could be, but nothing
> in the logs necessarily suggests that.  The "IO failure" is because 
> the fs shut down, went readonly, and subsequent IOs got -EIO,
> I think.

I've just used nc to transfer the filesystem to another system, there were no 
read errors so I don't think that a SSD hardware failure is the problem here.

I'm now getting similar problems running a 3.8 kernel with the filesystem on a 
loopback device.  I'll provide more information soon.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13 14:03       ` Russell Coker
@ 2013-03-13 19:19         ` Chris Mason
  2013-03-14  6:36           ` Russell Coker
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Mason @ 2013-03-13 19:19 UTC (permalink / raw)
  To: Russell Coker; +Cc: linux-btrfs

On Wed, Mar 13, 2013 at 08:03:53AM -0600, Russell Coker wrote:
> On Thu, 14 Mar 2013, Eric Sandeen <sandeen@redhat.com> wrote:
> > > It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
> > > burnt 1 SSD disk and 2 USB flash drive since I'm using BTRFS, in about
> > > 2 months for each. ddrescue'ing the SSD would probably give better
> > > chances of recovery and give BTRFS/btrfsck a chance to write correctly
> > > to the newly copied image.
> > 
> > On what do you base that theory?  I suppose it could be, but nothing
> > in the logs necessarily suggests that.  The "IO failure" is because 
> > the fs shut down, went readonly, and subsequent IOs got -EIO,
> > I think.
> 
> I've just used nc to transfer the filesystem to another system, there were no 
> read errors so I don't think that a SSD hardware failure is the problem here.
> 
> I'm now getting similar problems running a 3.8 kernel with the filesystem on a 
> loopback device.  I'll provide more information soon.

Bad key ordering is pretty rare, and it usually means memory
corruptions.  Are you reproducing this on the same machine or a
different one?

-chris


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13 19:19         ` Chris Mason
@ 2013-03-14  6:36           ` Russell Coker
  2013-03-14 13:04             ` Chris Mason
  0 siblings, 1 reply; 14+ messages in thread
From: Russell Coker @ 2013-03-14  6:36 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

[-- Attachment #1: Type: Text/Plain, Size: 1178 bytes --]

On Thu, 14 Mar 2013, Chris Mason <chris.mason@fusionio.com> wrote:
> Bad key ordering is pretty rare, and it usually means memory
> corruptions.  Are you reproducing this on the same machine or a
> different one?

I've attached a kernel message log of mounting it on another system (which 
incidentally has ECC RAM) running the Debian package of kernel 3.8.2.  The end 
result of this was a system on which the sync command blocked in D state 
indefinitely and which couldn't be rebooted in any way other than a hardware 
reset.

After that I ran btrfsck (which reported lots of errors) and it appeared to 
mount correctly.  I haven't yet tried to verify the integrity of the contents.

I've now run memtest86+ on the origin system and it reported some memory 
errors.  I'm now in the process of trying to determine what parts of the 
hardware failed.

So while the original corrupted filesystem was probably no fault of BTRFS the 
fact that another system with no hardware problem failed to operate correctly 
after trying to mount it seems to be a bug.

Thanks for your advice.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

[-- Attachment #2: mount-log.txt.gz --]
[-- Type: application/x-gzip, Size: 7390 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13  5:07   ` Jérôme Poulin
  2013-03-13 10:56     ` Bart Noordervliet
  2013-03-13 13:47     ` Eric Sandeen
@ 2013-03-14  9:48     ` Martin Steigerwald
  2 siblings, 0 replies; 14+ messages in thread
From: Martin Steigerwald @ 2013-03-14  9:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Jérôme Poulin, Eric Sandeen, russell

Hi Jérôme,

Am Mittwoch, 13. März 2013 schrieb Jérôme Poulin:
> On Tue, Mar 12, 2013 at 10:03 PM, Eric Sandeen <sandeen@redhat.com> wrote:
> > [   37.176790] BTRFS error (device dm-0) in __btrfs_free_extent:5143:
> > IO failure [   37.176791] btrfs is forced readonly
> > [   37.176793] btrfs: run_one_delayed_ref returned -5
> 
> It seems the SSD has bad blocks now, BTRFS seems to abuse SSD disks, I
> burnt 1 SSD disk and 2 USB flash drive since I'm using BTRFS, in about
> 2 months for each. ddrescue'ing the SSD would probably give better
> chances of recovery and give BTRFS/btrfsck a chance to write correctly
> to the newly copied image.

Well, the Intel SSD 320 in this ThinkPad T520 so far didn´t seem to notice
any significant abuse due to BTRFS in use:

smartctl-a-2013-03-14
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       5250
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       202408
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203778

Above value has always been in that range… according to a PDF from Intel the 
Media_Wearout_Indicator below is important.

233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       202408


I have about 20 GB / on BTRFS since the beginning. Thats almost 2 years now.

Now I also have 200GB /home on BTRFS, since a month or two. Granted this is
more data, but unless proven by observed I/O patterns or so, I suggest
being careful with suggestions that BTRFS abuses SSD disks out of just your
own experience and suggest to ask it as a question in case you do not know
for sure.

According to my irregular data points I also see no significant increase in
wear out after I switched BTRFS to /home although it is a bit premature to
say for sure - I will continue to have a look at it:

martin@merkaba:~/Computer/Merkaba/Intel SSD 320> for F in $(ls smartctl*) ; do echo "$F" | cut -c1-21 ; egrep "(Wear|Host_Writes|Erase_Fail_Count|
Power_On_Hours)" "$F" ; done
smartctl-a-2011-05-19
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
smartctl-a-2011-05-19
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
smartctl-a-2011-06-23
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       324
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       19158
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203342
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       19158
smartctl-a-2011-06-23
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       324
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       19158
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203342
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       19158
smartctl-a-2011-06-23
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       325
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       19160
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203342
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       19160
smartctl-a-2011-06-23
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       320
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       19041
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203342
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       19041
smartctl-a-2011-12-16
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       271

Mow thats funny. Intel SSD went back in time? From 325 to 271 power on
hours in half year. I knew I had a time machine somewhere. I just forgot
where it is. :)

172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169

First occurence with erase failures. But didn´t raise after then.

No other error related occurences in other values so far :)

225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       66757
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203450
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       66757
smartctl-a-2011-12-16
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       271
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       66759
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203450
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       66759
smartctl-a-2011-12-16
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       271
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       66757
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203450
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       66757
smartctl-a-2012-07-19
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2444
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       128105
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       314
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       128105
smartctl-a-2012-07-19
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2443
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       127984
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       314
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       127984
smartctl-a-2012-07-30
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2582
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       131072
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203604
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       131072
smartctl-a-2012-12-02
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       4023
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       170107
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203703
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       170107
smartctl-a-2013-02-22
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       5010
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       198165
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203768
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       198165
smartctl-a-2013-02-22
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       5010
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       198163
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203768
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       198163
smartctl-a-2013-03-14
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       5250
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       202408
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2203778
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       202408

More than one data point on one day, is before and after self tests.

Basically the wear out related values didn´t change much at all. The indicicative
Media_Wearout_Indicitor didn´t change at all.

I leave about 20 GB of the 300 GB free at most times, according to a paper
from Intel this helps long time performance and from my understanding of
SSD workings it also helps long evity.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-14  6:36           ` Russell Coker
@ 2013-03-14 13:04             ` Chris Mason
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Mason @ 2013-03-14 13:04 UTC (permalink / raw)
  To: Russell Coker; +Cc: Chris Mason, linux-btrfs

On Thu, Mar 14, 2013 at 12:36:09AM -0600, Russell Coker wrote:
> On Thu, 14 Mar 2013, Chris Mason <chris.mason@fusionio.com> wrote:
> > Bad key ordering is pretty rare, and it usually means memory
> > corruptions.  Are you reproducing this on the same machine or a
> > different one?
> 
> I've attached a kernel message log of mounting it on another system (which 
> incidentally has ECC RAM) running the Debian package of kernel 3.8.2.  The end 
> result of this was a system on which the sync command blocked in D state 
> indefinitely and which couldn't be rebooted in any way other than a hardware 
> reset.

Just to make sure I've got the sequence right, this is mounting the same
corrupted image on a second system?

The end result of that should be some messages about the bad blocks we
found and then the FS forced readonly.  If not, you're right there is
definitely a bug there.

-chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-13 11:31       ` Swâmi Petaramesh
@ 2013-03-14 19:04         ` Norbert Scheibner
  2013-03-14 23:17           ` Martin Steigerwald
  0 siblings, 1 reply; 14+ messages in thread
From: Norbert Scheibner @ 2013-03-14 19:04 UTC (permalink / raw)
  To: linux-btrfs

Am 13.03.2013, 12:31 Uhr, schrieb Swâmi Petaramesh <swami@petaramesh.org>:

> Le 13/03/2013 11:56, Bart Noordervliet a écrit :
>> USB flash drives are rubbish for any filesystem except FAT32 and then
>> still only gracefully accept large sequential writes. A few years ago
>> I thought it would be a good idea to put the root partition of a few
>> of my small Debian servers on USB flash, so that the harddisks could
>> spin down at night and I could easily prepare and switch a new
>> Debian-version. However, each and every USB stick got trashed within a
>> year
> I have an ARM box that runs a little Debian server (typically an
> advanced NAS), it uses an USB key as an ext2 root filesystem. Everything
> but big storage is there, and it's been up and running 24/7 for 3+ years
> without any USB key incident...

The difference is the fs. Ext3 uses a journal which uses always the same
physical sectors on disc. If the disc is a hard disk, it does not matter,
rewrites are no problem for platters. If it is an modern SSD, the SSD-
controller takes care and redirects the writes to different physical
sectors. USB-sticks have no smart controller and so the writes hit
always the same physical sector, it's like burning a hole in the flash
chip. If the commit time is standard for desktops set to 5 seconds, then
a whole year means a lot of writes to the same sector on an USB-stick.

Regards
   Norbert


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Debian 3.7.1 BTRFS crash
  2013-03-14 19:04         ` Norbert Scheibner
@ 2013-03-14 23:17           ` Martin Steigerwald
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Steigerwald @ 2013-03-14 23:17 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Norbert Scheibner

Am Donnerstag, 14. März 2013 schrieb Norbert Scheibner:
> Am 13.03.2013, 12:31 Uhr, schrieb Swâmi Petaramesh <swami@petaramesh.org>:
> > Le 13/03/2013 11:56, Bart Noordervliet a écrit :
> >> USB flash drives are rubbish for any filesystem except FAT32 and then
> >> still only gracefully accept large sequential writes. A few years ago
> >> I thought it would be a good idea to put the root partition of a few
> >> of my small Debian servers on USB flash, so that the harddisks could
> >> spin down at night and I could easily prepare and switch a new
> >> Debian-version. However, each and every USB stick got trashed within a
> >> year
> > 
> > I have an ARM box that runs a little Debian server (typically an
> > advanced NAS), it uses an USB key as an ext2 root filesystem.
> > Everything but big storage is there, and it's been up and running 24/7
> > for 3+ years without any USB key incident...
> 
> The difference is the fs. Ext3 uses a journal which uses always the same
> physical sectors on disc. If the disc is a hard disk, it does not matter,
> rewrites are no problem for platters. If it is an modern SSD, the SSD-
> controller takes care and redirects the writes to different physical
> sectors. USB-sticks have no smart controller and so the writes hit
> always the same physical sector, it's like burning a hole in the flash
> chip. If the commit time is standard for desktops set to 5 seconds, then
> a whole year means a lot of writes to the same sector on an USB-stick.

Are you sure that modern, high quality USB sticks don´t do any wear 
leveling?

On some SD cards there is some FAT optimizition in place[1][2]. I.e. good 
random access at beginning of drive, where FAT table and thus random I/O 
metadata accesses are. Ext3 places metadata elsewhere - I believe in about 
the middle of the partition.

[1] Flash memory card design, FAT optimization

https://wiki.linaro.org/WorkingGroups/KernelArchived/Projects/FlashCardSurvey

[2] Arnd Bergmann, Optimizing Linux with cheap flash drives

https://lwn.net/Articles/428584/

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-03-14 23:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-13  1:38 Debian 3.7.1 BTRFS crash Russell Coker
2013-03-13  1:56 ` Harald Glatt
2013-03-13  2:03 ` Eric Sandeen
2013-03-13  5:07   ` Jérôme Poulin
2013-03-13 10:56     ` Bart Noordervliet
2013-03-13 11:31       ` Swâmi Petaramesh
2013-03-14 19:04         ` Norbert Scheibner
2013-03-14 23:17           ` Martin Steigerwald
2013-03-13 13:47     ` Eric Sandeen
2013-03-13 14:03       ` Russell Coker
2013-03-13 19:19         ` Chris Mason
2013-03-14  6:36           ` Russell Coker
2013-03-14 13:04             ` Chris Mason
2013-03-14  9:48     ` Martin Steigerwald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).