From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Vivek Goyal <vgoyal@redhat.com>,
Tejun Heo <tj@kernel.org>, Shaohua Li <shli@fb.com>,
Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 4.14 49/74] block-throttle: avoid double charge
Date: Wed, 27 Dec 2017 17:46:22 +0100 [thread overview]
Message-ID: <20171227164616.049644783@linuxfoundation.org> (raw)
In-Reply-To: <20171227164614.109898944@linuxfoundation.org>
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shaohua Li <shli@fb.com>
commit 111be883981748acc9a56e855c8336404a8e787c upstream.
If a bio is throttled and split after throttling, the bio could be
resubmited and enters the throttling again. This will cause part of the
bio to be charged multiple times. If the cgroup has an IO limit, the
double charge will significantly harm the performance. The bio split
becomes quite common after arbitrary bio size change.
To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
If the bio is cloned/split, we copy the flag to new bio too to avoid a
double charge. However, cloned bio could be directed to a new disk,
keeping the flag be a problem. The observation is we always set new disk
for the bio in this case, so we can clear the flag in bio_set_dev().
This issue exists for a long time, arbitrary bio size change just makes
it worse, so this should go into stable at least since v4.2.
V1-> V2: Not add extra field in bio based on discussion with Tejun
Cc: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
block/bio.c | 2 ++
block/blk-throttle.c | 8 +-------
include/linux/bio.h | 2 ++
include/linux/blk_types.h | 9 ++++-----
4 files changed, 9 insertions(+), 12 deletions(-)
--- a/block/bio.c
+++ b/block/bio.c
@@ -599,6 +599,8 @@ void __bio_clone_fast(struct bio *bio, s
bio->bi_disk = bio_src->bi_disk;
bio->bi_partno = bio_src->bi_partno;
bio_set_flag(bio, BIO_CLONED);
+ if (bio_flagged(bio_src, BIO_THROTTLED))
+ bio_set_flag(bio, BIO_THROTTLED);
bio->bi_opf = bio_src->bi_opf;
bio->bi_write_hint = bio_src->bi_write_hint;
bio->bi_iter = bio_src->bi_iter;
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2223,13 +2223,7 @@ again:
out_unlock:
spin_unlock_irq(q->queue_lock);
out:
- /*
- * As multiple blk-throtls may stack in the same issue path, we
- * don't want bios to leave with the flag set. Clear the flag if
- * being issued.
- */
- if (!throttled)
- bio_clear_flag(bio, BIO_THROTTLED);
+ bio_set_flag(bio, BIO_THROTTLED);
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
if (throttled || !td->track_bio_latency)
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -504,6 +504,8 @@ extern unsigned int bvec_nr_vecs(unsigne
#define bio_set_dev(bio, bdev) \
do { \
+ if ((bio)->bi_disk != (bdev)->bd_disk) \
+ bio_clear_flag(bio, BIO_THROTTLED);\
(bio)->bi_disk = (bdev)->bd_disk; \
(bio)->bi_partno = (bdev)->bd_partno; \
} while (0)
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -50,8 +50,6 @@ struct blk_issue_stat {
struct bio {
struct bio *bi_next; /* request queue link */
struct gendisk *bi_disk;
- u8 bi_partno;
- blk_status_t bi_status;
unsigned int bi_opf; /* bottom bits req flags,
* top bits REQ_OP. Use
* accessors.
@@ -59,8 +57,8 @@ struct bio {
unsigned short bi_flags; /* status, etc and bvec pool number */
unsigned short bi_ioprio;
unsigned short bi_write_hint;
-
- struct bvec_iter bi_iter;
+ blk_status_t bi_status;
+ u8 bi_partno;
/* Number of segments in this BIO after
* physical address coalescing is performed.
@@ -74,8 +72,9 @@ struct bio {
unsigned int bi_seg_front_size;
unsigned int bi_seg_back_size;
- atomic_t __bi_remaining;
+ struct bvec_iter bi_iter;
+ atomic_t __bi_remaining;
bio_end_io_t *bi_end_io;
void *bi_private;
next prev parent reply other threads:[~2017-12-27 16:46 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-27 16:45 [PATCH 4.14 00/74] 4.14.10-stable review Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 02/74] objtool: Move synced files to their original relative locations Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 03/74] objtool: Move kernel headers/code sync check to a script Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 04/74] objtool: Fix cross-build Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 05/74] tools/headers: Sync objtool UAPI header Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 06/74] objtool: Fix 64-bit build on 32-bit host Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 07/74] x86/decoder: Fix and update the opcodes map Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 08/74] x86/insn-eval: Add utility functions to get segment selector Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 09/74] x86/Kconfig: Limit NR_CPUS on 32-bit to a sane amount Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 10/74] x86/mm/dump_pagetables: Check PAGE_PRESENT for real Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 11/74] x86/mm/dump_pagetables: Make the address hints correct and readable Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 12/74] x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 13/74] x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 14/74] arch, mm: Allow arch_dup_mmap() to fail Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 15/74] x86/ldt: Rework locking Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 16/74] x86/ldt: Prevent LDT inheritance on exec Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 17/74] x86/mm/64: Improve the memory map documentation Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 18/74] x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 19/74] x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 20/74] x86/uv: Use the right TLB-flush API Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 21/74] x86/microcode: Dont abuse the TLB-flush interface Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 22/74] x86/mm: Use __flush_tlb_one() for kernel memory Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 23/74] x86/mm: Remove superfluous barriers Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 24/74] x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 25/74] x86/mm: Move the CR3 construction functions to tlbflush.h Greg Kroah-Hartman
2017-12-27 16:45 ` [PATCH 4.14 26/74] x86/mm: Remove hard-coded ASID limit checks Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 27/74] x86/mm: Put MMU to hardware ASID translation in one place Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 28/74] x86/mm: Create asm/invpcid.h Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 29/74] x86/cpu_entry_area: Move it to a separate unit Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 30/74] x86/cpu_entry_area: Move it out of the fixmap Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 31/74] init: Invoke init_espfix_bsp() from mm_init() Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 32/74] x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 33/74] ACPI: APEI / ERST: Fix missing error handling in erst_reader() Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 34/74] acpi, nfit: fix health event notification Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 35/74] crypto: skcipher - set walk.iv for zero-length inputs Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 36/74] crypto: mcryptd - protect the per-CPU queue with a lock Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 37/74] crypto: af_alg - wait for data at beginning of recvmsg Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 38/74] crypto: af_alg - fix race accessing cipher request Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 39/74] mfd: cros ec: spi: Dont send first message too soon Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 40/74] mfd: twl4030-audio: Fix sibling-node lookup Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 41/74] mfd: twl6040: Fix child-node lookup Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 42/74] ALSA: rawmidi: Avoid racy info ioctl via ctl device Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 43/74] ALSA: hda/realtek - Fix Dell AIO LineOut issue Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 44/74] ALSA: hda - Add vendor id for Cannonlake HDMI codec Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 45/74] ALSA: usb-audio: Add native DSD support for Esoteric D-05X Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 46/74] ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 47/74] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 48/74] block: unalign call_single_data in struct request Greg Kroah-Hartman
2017-12-27 16:46 ` Greg Kroah-Hartman [this message]
2017-12-27 16:46 ` [PATCH 4.14 50/74] parisc: Align os_hpmc_size on word boundary Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 51/74] parisc: Fix indenting in puts() Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 52/74] parisc: Hide Diva-built-in serial aux and graphics card Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 53/74] Revert "parisc: Re-enable interrupts early" Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 54/74] spi: xilinx: Detect stall with Unknown commands Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 55/74] spi: a3700: Fix clk prescaling for coefficient over 15 Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 56/74] pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 57/74] arm64: kvm: Prevent restoring stale PMSCR_EL1 for vcpu Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 58/74] KVM: arm/arm64: Fix HYP unmapping going off limits Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 60/74] KVM: PPC: Book3S HV: Fix pending_pri value in kvmppc_xive_get_icp() Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 63/74] kvm: x86: fix RSM when PCID is non-zero Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 64/74] clk: sunxi: sun9i-mmc: Implement reset callback for reset controls Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 65/74] powerpc/perf: Dereference BHRB entries safely Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 66/74] drm/i915: Flush pending GTT writes before unbinding Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 67/74] drm/sun4i: Fix error path handling Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 68/74] libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 69/74] libnvdimm, btt: Fix an incompatibility in the log layout Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 70/74] libnvdimm, pfn: fix start_pad handling for aligned namespaces Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 71/74] net: mvneta: clear interface link status on port disable Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 72/74] net: mvneta: use proper rxq_number in loop on rx queues Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 73/74] net: mvneta: eliminate wrong call to handle rx descriptor error Greg Kroah-Hartman
2017-12-27 16:46 ` [PATCH 4.14 74/74] Revert "ipmi_si: fix memory leak on new_smi" Greg Kroah-Hartman
2017-12-28 5:59 ` [PATCH 4.14 00/74] 4.14.10-stable review Naresh Kamboju
2017-12-29 9:18 ` Greg Kroah-Hartman
2017-12-29 10:35 ` Milosz Wasilewski
2017-12-30 16:53 ` Milosz Wasilewski
2017-12-31 10:15 ` Greg Kroah-Hartman
2018-01-02 10:17 ` Milosz Wasilewski
2017-12-28 15:42 ` Guenter Roeck
2017-12-29 9:18 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171227164616.049644783@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=shli@fb.com \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).