From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Boaz Harrosh <bharrosh@panasas.com>
Subject: [PATCH 3.4 09/30] ore: Fix wrong math in allocation of per device BIO
Date: Tue, 11 Feb 2014 11:06:09 -0800 [thread overview]
Message-ID: <20140211184647.413723379@linuxfoundation.org> (raw)
In-Reply-To: <20140211184647.149512538@linuxfoundation.org>
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boaz Harrosh <bharrosh@panasas.com>
commit aad560b7f63b495f48a7232fd086c5913a676e6f upstream.
At IO preparation we calculate the max pages at each device and
allocate a BIO per device of that size. The calculation was wrong
on some unaligned corner cases offset/length combination and would
make prepare return with -ENOMEM. This would be bad for pnfs-objects
that would in that case IO through MDS. And fatal for exofs were it
would fail writes with EIO.
Fix it by doing the proper math, that will work in all cases. (I
ran a test with all possible offset/length combinations this time
round).
Also when reading we do not need to allocate for the parity units
since we jump over them.
Also lower the max_io_length to take into account the parity pages
so not to allocate BIOs bigger than PAGE_SIZE
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/exofs/ore.c | 37 +++++++++++++++++++++++++------------
include/scsi/osd_ore.h | 1 +
2 files changed, 26 insertions(+), 12 deletions(-)
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -103,7 +103,7 @@ int ore_verify_layout(unsigned total_com
layout->max_io_length =
(BIO_MAX_PAGES_KMALLOC * PAGE_SIZE - layout->stripe_unit) *
- layout->group_width;
+ (layout->group_width - layout->parity);
if (layout->parity) {
unsigned stripe_length =
(layout->group_width - layout->parity) *
@@ -286,7 +286,8 @@ int ore_get_rw_state(struct ore_layout
if (length) {
ore_calc_stripe_info(layout, offset, length, &ios->si);
ios->length = ios->si.length;
- ios->nr_pages = (ios->length + PAGE_SIZE - 1) / PAGE_SIZE;
+ ios->nr_pages = ((ios->offset & (PAGE_SIZE - 1)) +
+ ios->length + PAGE_SIZE - 1) / PAGE_SIZE;
if (layout->parity)
_ore_post_alloc_raid_stuff(ios);
}
@@ -536,6 +537,7 @@ void ore_calc_stripe_info(struct ore_lay
u64 H = LmodS - G * T;
u32 N = div_u64(H, U);
+ u32 Nlast;
/* "H - (N * U)" is just "H % U" so it's bound to u32 */
u32 C = (u32)(H - (N * U)) / stripe_unit + G * group_width;
@@ -568,6 +570,10 @@ void ore_calc_stripe_info(struct ore_lay
si->length = T - H;
if (si->length > length)
si->length = length;
+
+ Nlast = div_u64(H + si->length + U - 1, U);
+ si->maxdevUnits = Nlast - N;
+
si->M = M;
}
EXPORT_SYMBOL(ore_calc_stripe_info);
@@ -583,13 +589,16 @@ int _ore_add_stripe_unit(struct ore_io_s
int ret;
if (per_dev->bio == NULL) {
- unsigned pages_in_stripe = ios->layout->group_width *
- (ios->layout->stripe_unit / PAGE_SIZE);
- unsigned nr_pages = ios->nr_pages * ios->layout->group_width /
- (ios->layout->group_width -
- ios->layout->parity);
- unsigned bio_size = (nr_pages + pages_in_stripe) /
- ios->layout->group_width;
+ unsigned bio_size;
+
+ if (!ios->reading) {
+ bio_size = ios->si.maxdevUnits;
+ } else {
+ bio_size = (ios->si.maxdevUnits + 1) *
+ (ios->layout->group_width - ios->layout->parity) /
+ ios->layout->group_width;
+ }
+ bio_size *= (ios->layout->stripe_unit / PAGE_SIZE);
per_dev->bio = bio_kmalloc(GFP_KERNEL, bio_size);
if (unlikely(!per_dev->bio)) {
@@ -609,8 +618,12 @@ int _ore_add_stripe_unit(struct ore_io_s
added_len = bio_add_pc_page(q, per_dev->bio, pages[pg],
pglen, pgbase);
if (unlikely(pglen != added_len)) {
- ORE_DBGMSG("Failed bio_add_pc_page bi_vcnt=%u\n",
- per_dev->bio->bi_vcnt);
+ /* If bi_vcnt == bi_max then this is a SW BUG */
+ ORE_DBGMSG("Failed bio_add_pc_page bi_vcnt=0x%x "
+ "bi_max=0x%x BIO_MAX=0x%x cur_len=0x%x\n",
+ per_dev->bio->bi_vcnt,
+ per_dev->bio->bi_max_vecs,
+ BIO_MAX_PAGES_KMALLOC, cur_len);
ret = -ENOMEM;
goto out;
}
@@ -1099,7 +1112,7 @@ int ore_truncate(struct ore_layout *layo
size_attr->attr = g_attr_logical_length;
size_attr->attr.val_ptr = &size_attr->newsize;
- ORE_DBGMSG("trunc(0x%llx) obj_offset=0x%llx dev=%d\n",
+ ORE_DBGMSG2("trunc(0x%llx) obj_offset=0x%llx dev=%d\n",
_LLU(oc->comps->obj.id), _LLU(obj_size), i);
ret = _truncate_mirrors(ios, i * ios->layout->mirrors_p1,
&size_attr->attr);
--- a/include/scsi/osd_ore.h
+++ b/include/scsi/osd_ore.h
@@ -102,6 +102,7 @@ struct ore_striping_info {
unsigned unit_off;
unsigned cur_pg;
unsigned cur_comp;
+ unsigned maxdevUnits;
};
struct ore_io_state;
next prev parent reply other threads:[~2014-02-11 19:06 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-11 19:06 [PATCH 3.4 00/30] 3.4.80-stable review Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 01/30] SELinux: Fix memory leak upon loading policy Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 02/30] intel-iommu: fix off-by-one in pagetable freeing Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 03/30] audit: correct a type mismatch in audit_syscall_exit() Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 04/30] mmc: atmel-mci: fix timeout errors in SDIO mode when using DMA Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 05/30] slub: Fix calculation of cpu slabs Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 07/30] ACPI / init: Flag use of ACPI and ACPI idioms for power supplies to regulator API Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 08/30] mtd: mxc_nand: remove duplicated ecc_stats counting Greg Kroah-Hartman
2014-02-11 19:06 ` Greg Kroah-Hartman [this message]
2014-02-11 19:06 ` [PATCH 3.4 10/30] IB/qib: Fix QP check when looping back to/from QP1 Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 11/30] spidev: fix hang when transfer_one_message fails Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 12/30] NFSv4: OPEN must handle the NFS4ERR_IO return code correctly Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 13/30] nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 14/30] sunrpc: Fix infinite loop in RPC state machine Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 15/30] dm: wait until embedded kobject is released before destroying a device Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 16/30] dm space map common: make sure new space is used during extend Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 20/30] drm/radeon: set the full cache bit for fences on r7xx+ Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 21/30] drm/radeon/DCE4+: clear bios scratch dpms bit (v2) Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 22/30] PCI: Enable ARI if dev and upstream bridge support it; disable otherwise Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 23/30] hpfs: deadlock and race in directory lseek() Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 24/30] sched/rt: Fix SCHED_RR across cgroups Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 25/30] sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 26/30] sched: Unthrottle rt runqueues in __disable_runtime() Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 27/30] sched/rt: Avoid updating RT entry timeout twice within one tick period Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 28/30] rtc-cmos: Add an alarm disable quirk Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 29/30] timekeeping: Avoid possible deadlock from clock_was_set_delayed Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 30/30] 3.4.y: timekeeping: fix 32-bit overflow in get_monotonic_boottime Greg Kroah-Hartman
2014-02-12 4:19 ` [PATCH 3.4 00/30] 3.4.80-stable review Guenter Roeck
2014-02-12 18:55 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140211184647.413723379@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bharrosh@panasas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).