From: Andi Kleen <andi@firstfloor.org>
To: Philipp Reisner <philipp.reisner@linbit.com>
Cc: linux-kernel@vger.kernel.org, gregkh@suse.de
Subject: Re: [PATCH 02/12] DRBD: activity_log
Date: Tue, 24 Mar 2009 13:27:51 +0100 [thread overview]
Message-ID: <87bprrhzqw.fsf@basil.nowhere.org> (raw)
In-Reply-To: <1237823287-12734-3-git-send-email-philipp.reisner@linbit.com> (Philipp Reisner's message of "Mon, 23 Mar 2009 16:47:57 +0100")
Philipp Reisner <philipp.reisner@linbit.com> writes:
> +
> +/* I do not believe that all storage medias can guarantee atomic
> + * 512 byte write operations. When the journal is read, only
> + * transactions with correct xor_sums are considered.
> + * sizeof() = 512 byte */
> +struct __attribute__((packed)) al_transaction {
> + u32 magic;
> + u32 tr_number;
> + /* u32 tr_generation; TODO */
It would be difficult to "TODO" this because adding that field here would
break the complete disk format, wouldn't it?
> + struct __attribute__((packed)) {
> + u32 pos;
> + u32 extent; } updates[1 + AL_EXTENTS_PT];
> + u32 xor_sum;
> +};
> + ok = bio_flagged(bio, BIO_UPTODATE) && md_io.error == 0;
> +
> + /* check for unsupported barrier op.
> + * would rather check on EOPNOTSUPP, but that is not reliable.
> + * don't try again for ANY return value != 0 */
> + if (unlikely(bio_barrier(bio) && !ok)) {
That's a good example for some code that shouldn't be in upstream. If
EOPNOTSUPP for barriers is really not reliable somewhere please just
fix that somewhere (if it's still true and not some ancient bug), not
add workarounds like this.
> +int drbd_md_sync_page_io(struct drbd_conf *mdev, struct drbd_backing_dev *bdev,
> + sector_t sector, int rw)
> +{
> + int hardsect, mask, ok;
> + int offset = 0;
> + struct page *iop = mdev->md_io_page;
> +
> + D_ASSERT(mutex_is_locked(&mdev->md_io_mutex));
> +
> + if (!bdev->md_bdev) {
> + if (DRBD_ratelimit(5*HZ, 5)) {
The kernel has standard functions for this, no need for own macros.
> + ERR("bdev->md_bdev==NULL\n");
> + dump_stack();
> + }
And a rate limited dump_stack seems weird anyways.
> + return 0;
> + }
> +
> + hardsect = drbd_get_hardsect(bdev->md_bdev);
> + if (hardsect == 0)
> + hardsect = MD_HARDSECT;
> +
> + /* in case hardsect != 512 [ s390 only? ] */
> + if (hardsect != MD_HARDSECT) {
> + if (!mdev->md_io_tmpp) {
> + struct page *page = alloc_page(GFP_NOIO);
At least the conventional wisdom is still that block devices should
use mempools, not alloc_page even with NOIO, otherwise they might
not write out in all lowmem situations. There's been some VM work
to address this, but so far nobody was sure that it is sufficient.
> + if (!page)
> + return 0;
So you get a IO error or what happens here on out of memory?
> +
> + drbd_WARN("Meta data's bdev hardsect = %d != %d\n",
> + hardsect, MD_HARDSECT);
> + drbd_WARN("Workaround engaged (has performace impact).\n");
> +
> + mdev->md_io_tmpp = page;
> + }
> +
> + mask = (hardsect / MD_HARDSECT) - 1;
> + D_ASSERT(mask == 1 || mask == 3 || mask == 7);
> + D_ASSERT(hardsect == (mask+1) * MD_HARDSECT);
> + offset = sector & mask;
> + sector = sector & ~mask;
> + iop = mdev->md_io_tmpp;
> +
> + if (rw == WRITE) {
> + void *p = page_address(mdev->md_io_page);
> + void *hp = page_address(mdev->md_io_tmpp);
What happens when the page is in highmem?
> + al_work.old_enr = al_ext->lc_number;
> + al_work.w.cb = w_al_write_transaction;
> + drbd_queue_work_front(&mdev->data.work, &al_work.w);
> + wait_for_completion(&al_work.event);
> +
> + mdev->al_writ_cnt++;
> +
> + spin_lock_irq(&mdev->al_lock);
> + lc_changed(mdev->act_log, al_ext);
> + spin_unlock_irq(&mdev->al_lock);
> + wake_up(&mdev->al_wait);
The wake_up outside the lock looks a little dangerous.
> +int
> +w_al_write_transaction(struct drbd_conf *mdev, struct drbd_work *w, int unused)
> +{
> + struct update_al_work *aw = (struct update_al_work *)w;
> + struct lc_element *updated = aw->al_ext;
> + const unsigned int new_enr = aw->enr;
> + const unsigned int evicted = aw->old_enr;
> +
> + struct al_transaction *buffer;
> + sector_t sector;
> + int i, n, mx;
> + unsigned int extent_nr;
> + u32 xor_sum = 0;
> +
> + if (!inc_local(mdev)) {
> + ERR("inc_local() failed in w_al_write_transaction\n");
> + complete(&((struct update_al_work *)w)->event);
> + return 1;
> + }
> + /* do we have to do a bitmap write, first?
> + * TODO reduce maximum latency:
> + * submit both bios, then wait for both,
> + * instead of doing two synchronous sector writes. */
> + if (mdev->state.conn < Connected && evicted != LC_FREE)
> + drbd_bm_write_sect(mdev, evicted/AL_EXT_PER_BM_SECT);
> +
> + mutex_lock(&mdev->md_io_mutex); /* protects md_io_buffer, al_tr_cycle, ... */
Doing checksumming inside a lock looks nasty.
Didn't read further. It's a lot of code. This was not a complete review,
just some quick comments.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
next prev parent reply other threads:[~2009-03-24 12:28 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-23 15:47 [PATCH 00/12] DRBD: a block device for HA clusters Philipp Reisner
2009-03-23 15:47 ` [PATCH 01/12] DRBD: lru_cache Philipp Reisner
2009-03-23 15:47 ` [PATCH 02/12] DRBD: activity_log Philipp Reisner
2009-03-23 15:47 ` [PATCH 03/12] DRBD: bitmap Philipp Reisner
2009-03-23 15:47 ` [PATCH 04/12] DRBD: request Philipp Reisner
2009-03-23 15:48 ` [PATCH 05/12] DRBD: userspace_interface Philipp Reisner
2009-03-23 15:48 ` [PATCH 06/12] DRBD: internal_data_structures Philipp Reisner
2009-03-23 15:48 ` [PATCH 07/12] DRBD: main Philipp Reisner
2009-03-23 15:48 ` [PATCH 08/12] DRBD: receiver Philipp Reisner
2009-03-23 15:48 ` [PATCH 09/12] DRBD: proc Philipp Reisner
2009-03-23 15:48 ` [PATCH 10/12] DRBD: worker Philipp Reisner
2009-03-23 15:48 ` [PATCH 11/12] DRBD: misc Philipp Reisner
2009-03-23 15:48 ` [PATCH 12/12] DRBD: final Philipp Reisner
2009-04-08 10:17 ` [PATCH 08/12] DRBD: receiver Nikanth K
2009-04-08 15:10 ` Philipp Reisner
2009-03-23 16:51 ` [PATCH 07/12] DRBD: main Alexey Dobriyan
2009-03-23 22:26 ` Philipp Reisner
2009-04-08 10:16 ` [PATCH 04/12] DRBD: request Nikanth K
2009-04-08 10:16 ` [PATCH 03/12] DRBD: bitmap Nikanth K
2009-04-08 15:09 ` Philipp Reisner
2009-03-24 12:27 ` Andi Kleen [this message]
2009-03-25 10:27 ` [PATCH 02/12] DRBD: activity_log Philipp Reisner
2009-03-25 10:46 ` Andi Kleen
2009-03-25 10:57 ` Philipp Reisner
2009-03-23 15:58 ` [PATCH 01/12] DRBD: lru_cache Greg KH
2009-04-08 10:15 ` Nikanth K
2009-04-08 15:09 ` Philipp Reisner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bprrhzqw.fsf@basil.nowhere.org \
--to=andi@firstfloor.org \
--cc=gregkh@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=philipp.reisner@linbit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox