All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Thornber <thornber@redhat.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Morgan.Mears@netapp.com,
	device-mapper development <dm-devel@redhat.com>,
	Mike Snitzer <msnitzer@redhat.com>,
	"Alasdair G. Kergon" <agk@redhat.com>
Subject: Re: [PATCH for-3.15 1/3] dm: add era target
Date: Tue, 11 Mar 2014 14:56:16 +0000	[thread overview]
Message-ID: <20140311145616.GK16115@debian> (raw)
In-Reply-To: <alpine.LRH.2.02.1403101739270.28738@file01.intranet.prod.int.rdu2.redhat.com>

Hi Mikulas,

Thanks for taking the time to look through this.  I think you're right
on all counts.

- Joe


On Mon, Mar 10, 2014 at 08:08:14PM -0400, Mikulas Patocka wrote:
> Hi
> 
> I reviewed dm-era, here is a possible list of bugs:
> 
> There is no tracking of in-progress writes. Suppose that this happens:
> 1) a write is received, era_map sees that metadata_current_marked(era->md, 
>    block) is true, so it lets the write through
> 2) the user sends the checkpoint message
> 3) the write received in step 1) hits the disk and modifies it
> 4) the user sends metadata_take_snap message, the write is not in the 
>    snapshot, although it modified the disk between steps 2) and 4)
> You can use the functions dm_internal_suspend and dm_internal_resume to 
> clear in-progress bios - this is lightweight suspend/resume callable only 
> from the kernel.
> 
> era_destroy calls metadata_close(era->md); without checking that era->md 
> is not NULL. This can cause crash if era_destroy is called early in the 
> constructor.
> 
> era->sectors_per_block should be validated that it is not zero.
> 
> There is no write memory barrier between the setting of rpc->result and 
> rpc->complete:
>                 if (r)
>                         list_for_each_entry_safe(rpc, tmp, &calls, list)
>                                 rpc->result = r;
>         }
> 
>         list_for_each_entry_safe(rpc, tmp, &calls, list) {
>                 atomic_set(&rpc->complete, 1);
> And there is no read memory barrier between the reading of rpc->complete 
> and rpc->result:
>         wait_event(rpc->wait, atomic_read(&rpc->complete));
> 
>         return rpc->result;
> You should use completion (include/linux/completion.h) instead of 
> rpc->complete and rpc->wait, it simplifies the code and takes care of 
> memory barriers.
> 
> metadata_space_map_root is not aligned on 64-bit boundary, but it is 
> accessed as 64-bit numbers in sm_ll_open_metadata.
> 
> Mixing 32-bit and 64-bit current_era (I'm not sure it this is a bug or 
> not).
> 
> Missing read and write memory barriers around accesses to 
> archived_writesets (I'm not sure if this is a bug, but it looks strange).
> 
> create_fresh_metadata doesn't free md->tm and md->sm on errors.
> 
> format_metadata doesn't undo the effect of create_fresh_metadata if 
> write_superblock fails.
> 
> superblock_all_zeroes checks the full block for zeroes, although lvm 
> zeroes only zeroes 4k at the beginning of a new logical volume.
> 
> test_bit and test_and_set_bit take signed integer as a parameter (on some 
> architectures the argument is long, on some int), so you must restrict the 
> maximum number of blocks to 2^31-1. Or, alternatively, do not use test_bit 
> and test_and_set_bit and implement these functions on your own - this will 
> allow you to have 64-bit block numbers.
> 
> The argument to bitset_size may overflow, it is not checked. If you allow 
> 64-bit argument to bitset_size, you must also check that the conversion to 
> size_t does not overflow.
> 
> era_status: /* FIXME: do we need to protect this? */ dm_sm_get_nr_free - 
> on 32-bit architectures you need to lock it because 64-bit accesses are 
> nonatomic. You can see the implementation of i_size_read and i_size_write 
> and do something similar.
> 
> era_map doesn't distinguish REQ_FLUSH bios, it should just pass them 
> through without accessing bio->bi_iter.bi_sector.
> 
> 
> Possible improvements:
> 
> atomic64_inc(&md->current_era); - there is no concurrent access to 
> current_era, so you can use this to avoid the interlocked instruction:
> atomic64_set(&md->current_era, atomic64_read(&md->current_era) + 1)
> 
> process_deferred_bios: deferred_lock is never taken from interrupt, so you 
> can replace spin_lock_irqsave/spin_unlock_irqrestore with 
> spin_lock/spin_unlock.
> 
> writeset_test_and_set: test_and_set_bit is interlocked instruction - here, 
> no concurrent access happens, so use __test_and_set_bit, which is not 
> interlocked. Alternatively - reimplement it if you want 2^31 or more 
> blocks - see above.
> 
> 
> Mikulas

      reply	other threads:[~2014-03-11 14:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-07  0:47 [PATCH for-3.15 1/3] dm: add era target Mike Snitzer
2014-03-07  0:47 ` [PATCH for-3.15 2/3] dm era: support non power-of-2 blocksize Mike Snitzer
2014-03-07  0:47 ` [PATCH for-3.15 3/3] dm bitset: only flush the current word if it has been dirtied Mike Snitzer
2014-03-11  0:08 ` [PATCH for-3.15 1/3] dm: add era target Mikulas Patocka
2014-03-11 14:56   ` Joe Thornber [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140311145616.GK16115@debian \
    --to=thornber@redhat.com \
    --cc=Morgan.Mears@netapp.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=mpatocka@redhat.com \
    --cc=msnitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.