All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: <bo.li.liu@oracle.com>
Cc: <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] Btrfs: do not move em to modified list when unpinning
Date: Tue, 18 Nov 2014 11:03:43 -0500	[thread overview]
Message-ID: <546B6DDF.1050909@fb.com> (raw)
In-Reply-To: <20141118071308.GB12504@localhost.localdomain>

On 11/18/2014 02:13 AM, Liu Bo wrote:
> On Fri, Nov 14, 2014 at 04:16:30PM -0500, Josef Bacik wrote:
>> We use the modified list to keep track of which extents have been modified so we
>> know which ones are candidates for logging at fsync() time.  Newly modified
>> extents are added to the list at modification time, around the same time the
>> ordered extent is created.  We do this so that we don't have to wait for ordered
>> extents to complete before we know what we need to log.  The problem is when
>> something like this happens
>>
>> log extent 0-4k on inode 1
>> copy csum for 0-4k from ordered extent into log
>> sync log
>> commit transaction
>> log some other extent on inode 1
>> ordered extent for 0-4k completes and adds itself onto modified list again
>> log changed extents
>> see ordered extent for 0-4k has already been logged
>> 	at this point we assume the csum has been copied
>> sync log
>> crash
>>
>> On replay we will see the extent 0-4k in the log, drop the original 0-4k extent
>> which is the same one that we are replaying which also drops the csum, and then
>> we won't find the csum in the log for that bytenr.  This of course causes us to
>> have errors about not having csums for certain ranges of our inode.  So remove
>> the modified list manipulation in unpin_extent_cache, any modified extents
>> should have been added well before now, and we don't want them re-logged.  This
>> fixes my test that I could reliably reproduce this problem with.  Thanks,
>
> This will make em->generation remain -1 in the above case, no?
>
> Csum is copied after ordered extent is set with "IO_DONE", but before
> unpin_extent_cache(), "sync log" happens, so if we dont have it 're-logged',
> em->generation will not get updated so that the btrfs_file_extent_item's generation will not be
> updated.
>

Huh this is a good point, but it brings up another horrible thing I just 
realized, in this case that we die directly after that transaction 
commit we'll lose the extent anyway because we'll drop the tree log and 
the extent won't actually be in its actual tree.  Woo I love this stuf.

Josef

  reply	other threads:[~2014-11-18 16:03 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-14 21:16 [PATCH] Btrfs: do not move em to modified list when unpinning Josef Bacik
2014-11-18  7:13 ` Liu Bo
2014-11-18 16:03   ` Josef Bacik [this message]
2014-11-19  3:45 ` Dave Chinner
2014-11-19 14:57   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546B6DDF.1050909@fb.com \
    --to=jbacik@fb.com \
    --cc=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.