Re: [PATCH 02/12] DRBD: activity_log

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andi Kleen <andi@firstfloor.org>
To: Philipp Reisner <philipp.reisner@linbit.com>
Cc: linux-kernel@vger.kernel.org, gregkh@suse.de
Subject: Re: [PATCH 02/12] DRBD: activity_log
Date: Tue, 24 Mar 2009 13:27:51 +0100	[thread overview]
Message-ID: <87bprrhzqw.fsf@basil.nowhere.org> (raw)
In-Reply-To: <1237823287-12734-3-git-send-email-philipp.reisner@linbit.com> (Philipp Reisner's message of "Mon, 23 Mar 2009 16:47:57 +0100")

Philipp Reisner <philipp.reisner@linbit.com> writes:
> +
> +/* I do not believe that all storage medias can guarantee atomic
> + * 512 byte write operations. When the journal is read, only
> + * transactions with correct xor_sums are considered.
> + * sizeof() = 512 byte */
> +struct __attribute__((packed)) al_transaction {
> +	u32       magic;
> +	u32       tr_number;
> +	/* u32       tr_generation; TODO */

It would be difficult to "TODO" this because adding that field here would
break the complete disk format, wouldn't it?

> +	struct __attribute__((packed)) {
> +		u32 pos;
> +		u32 extent; } updates[1 + AL_EXTENTS_PT];
> +	u32       xor_sum;
> +};
> +	ok = bio_flagged(bio, BIO_UPTODATE) && md_io.error == 0;
> +
> +	/* check for unsupported barrier op.
> +	 * would rather check on EOPNOTSUPP, but that is not reliable.
> +	 * don't try again for ANY return value != 0 */
> +	if (unlikely(bio_barrier(bio) && !ok)) {

That's a good example for some code that shouldn't be in upstream. If
EOPNOTSUPP for barriers is really not reliable somewhere please just
fix that somewhere (if it's still true and not some ancient bug), not
add workarounds like this.

> +int drbd_md_sync_page_io(struct drbd_conf *mdev, struct drbd_backing_dev *bdev,
> +			 sector_t sector, int rw)
> +{
> +	int hardsect, mask, ok;
> +	int offset = 0;
> +	struct page *iop = mdev->md_io_page;
> +
> +	D_ASSERT(mutex_is_locked(&mdev->md_io_mutex));
> +
> +	if (!bdev->md_bdev) {
> +		if (DRBD_ratelimit(5*HZ, 5)) {

The kernel has standard functions for this, no need for own macros.

> +			ERR("bdev->md_bdev==NULL\n");
> +			dump_stack();
> +		}


And a rate limited dump_stack seems weird anyways.

> +		return 0;
> +	}
> +
> +	hardsect = drbd_get_hardsect(bdev->md_bdev);
> +	if (hardsect == 0)
> +		hardsect = MD_HARDSECT;
> +
> +	/* in case hardsect != 512 [ s390 only? ] */
> +	if (hardsect != MD_HARDSECT) {
> +		if (!mdev->md_io_tmpp) {
> +			struct page *page = alloc_page(GFP_NOIO);

At least the conventional wisdom is still that block devices should 
use mempools, not alloc_page even with NOIO, otherwise they might
not write out in all lowmem situations. There's been some VM work
to address this, but so far nobody was sure that it is sufficient.


> +			if (!page)
> +				return 0;

So you get a IO error or what happens here on out of memory?

> +
> +			drbd_WARN("Meta data's bdev hardsect = %d != %d\n",
> +			     hardsect, MD_HARDSECT);
> +			drbd_WARN("Workaround engaged (has performace impact).\n");
> +
> +			mdev->md_io_tmpp = page;
> +		}
> +
> +		mask = (hardsect / MD_HARDSECT) - 1;
> +		D_ASSERT(mask == 1 || mask == 3 || mask == 7);
> +		D_ASSERT(hardsect == (mask+1) * MD_HARDSECT);
> +		offset = sector & mask;
> +		sector = sector & ~mask;
> +		iop = mdev->md_io_tmpp;
> +
> +		if (rw == WRITE) {
> +			void *p = page_address(mdev->md_io_page);
> +			void *hp = page_address(mdev->md_io_tmpp);

What happens when the page is in highmem?

> +		al_work.old_enr = al_ext->lc_number;
> +		al_work.w.cb = w_al_write_transaction;
> +		drbd_queue_work_front(&mdev->data.work, &al_work.w);
> +		wait_for_completion(&al_work.event);
> +
> +		mdev->al_writ_cnt++;
> +
> +		spin_lock_irq(&mdev->al_lock);
> +		lc_changed(mdev->act_log, al_ext);
> +		spin_unlock_irq(&mdev->al_lock);
> +		wake_up(&mdev->al_wait);

The wake_up outside the lock looks a little dangerous.

> +int
> +w_al_write_transaction(struct drbd_conf *mdev, struct drbd_work *w, int unused)
> +{
> +	struct update_al_work *aw = (struct update_al_work *)w;
> +	struct lc_element *updated = aw->al_ext;
> +	const unsigned int new_enr = aw->enr;
> +	const unsigned int evicted = aw->old_enr;
> +
> +	struct al_transaction *buffer;
> +	sector_t sector;
> +	int i, n, mx;
> +	unsigned int extent_nr;
> +	u32 xor_sum = 0;
> +
> +	if (!inc_local(mdev)) {
> +		ERR("inc_local() failed in w_al_write_transaction\n");
> +		complete(&((struct update_al_work *)w)->event);
> +		return 1;
> +	}
> +	/* do we have to do a bitmap write, first?
> +	 * TODO reduce maximum latency:
> +	 * submit both bios, then wait for both,
> +	 * instead of doing two synchronous sector writes. */
> +	if (mdev->state.conn < Connected && evicted != LC_FREE)
> +		drbd_bm_write_sect(mdev, evicted/AL_EXT_PER_BM_SECT);
> +
> +	mutex_lock(&mdev->md_io_mutex); /* protects md_io_buffer, al_tr_cycle, ... */

Doing checksumming inside a lock looks nasty.

Didn't read further. It's a lot of code. This was not a complete review,
just some quick comments.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

next prev parent reply	other threads:[~2009-03-24 12:28 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-23 15:47 [PATCH 00/12] DRBD: a block device for HA clusters Philipp Reisner
2009-03-23 15:47 ` [PATCH 01/12] DRBD: lru_cache Philipp Reisner
2009-03-23 15:47   ` [PATCH 02/12] DRBD: activity_log Philipp Reisner
2009-03-23 15:47     ` [PATCH 03/12] DRBD: bitmap Philipp Reisner
2009-03-23 15:47       ` [PATCH 04/12] DRBD: request Philipp Reisner
2009-03-23 15:48         ` [PATCH 05/12] DRBD: userspace_interface Philipp Reisner
2009-03-23 15:48           ` [PATCH 06/12] DRBD: internal_data_structures Philipp Reisner
2009-03-23 15:48             ` [PATCH 07/12] DRBD: main Philipp Reisner
2009-03-23 15:48               ` [PATCH 08/12] DRBD: receiver Philipp Reisner
2009-03-23 15:48                 ` [PATCH 09/12] DRBD: proc Philipp Reisner
2009-03-23 15:48                   ` [PATCH 10/12] DRBD: worker Philipp Reisner
2009-03-23 15:48                     ` [PATCH 11/12] DRBD: misc Philipp Reisner
2009-03-23 15:48                       ` [PATCH 12/12] DRBD: final Philipp Reisner
2009-04-08 10:17                 ` [PATCH 08/12] DRBD: receiver Nikanth K
2009-04-08 15:10                   ` Philipp Reisner
2009-03-23 16:51               ` [PATCH 07/12] DRBD: main Alexey Dobriyan
2009-03-23 22:26                 ` Philipp Reisner
2009-04-08 10:16         ` [PATCH 04/12] DRBD: request Nikanth K
2009-04-08 10:16       ` [PATCH 03/12] DRBD: bitmap Nikanth K
2009-04-08 15:09         ` Philipp Reisner
2009-03-24 12:27     ` Andi Kleen [this message]
2009-03-25 10:27       ` [PATCH 02/12] DRBD: activity_log Philipp Reisner
2009-03-25 10:46         ` Andi Kleen
2009-03-25 10:57           ` Philipp Reisner
2009-03-23 15:58   ` [PATCH 01/12] DRBD: lru_cache Greg KH
2009-04-08 10:15   ` Nikanth K
2009-04-08 15:09     ` Philipp Reisner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bprrhzqw.fsf@basil.nowhere.org \
    --to=andi@firstfloor.org \
    --cc=gregkh@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=philipp.reisner@linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox