All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Daniel Browning <db@kavod.com>,
	Mike Snitzer <snitzer@redhat.com>,
	Alasdair G Kergon <agk@redhat.com>
Subject: [ 16/61] dm thin: fix queue limits stacking
Date: Tue, 12 Feb 2013 12:34:36 -0800	[thread overview]
Message-ID: <20130212203420.469918899@linuxfoundation.org> (raw)
In-Reply-To: <20130212203417.890993903@linuxfoundation.org>

3.7-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Snitzer <snitzer@redhat.com>

commit 0f640dca08330dfc7820d610578e5935b5e654b2 upstream.

thin_io_hints() is blindly copying the queue limits from the thin-pool
which can lead to incorrect limits being set.  The fix here simply
deletes the thin_io_hints() hook which leaves the existing stacking
infrastructure to set the limits correctly.

When a thin-pool uses an MD device for the data device a thin device
from the thin-pool must respect MD's constraints about disallowing a bio
from spanning multiple chunks.  Otherwise we can see problems.  If the raid0
chunksize is 1152K and thin-pool chunksize is 256K I see the following
md/raid0 error (with extra debug tracing added to thin_endio) when
mkfs.xfs is executed against the thin device:

md/raid0:md99: make_request bug: can't convert block across chunks or bigger than 1152k 6688 127
device-mapper: thin: bio sector=2080 err=-5 bi_size=130560 bi_rw=17 bi_vcnt=32 bi_idx=0

This extra DM debugging shows that the failing bio is spanning across
the first and second logical 1152K chunk (sector 2080 + 255 takes the
bio beyond the first chunk's boundary of sector 2304).  So the bio
splitting that DM is doing clearly isn't respecting the MD limits.

max_hw_sectors_kb is 127 for both the thin-pool and thin device
(queue_max_hw_sectors returns 255 so we'll excuse sysfs's lack of
precision).  So this explains why bi_size is 130560.

But the thin device's max_hw_sectors_kb should be 4 (PAGE_SIZE) given
that it doesn't have a .merge function (for bio_add_page to consult
indirectly via dm_merge_bvec) yet the thin-pool does sit above an MD
device that has a compulsory merge_bvec_fn.  This scenario is exactly
why DM must resort to sending single PAGE_SIZE bios to the underlying
layer. Some additional context for this is available in the header for
commit 8cbeb67a ("dm: avoid unsupported spanning of md stripe boundaries").

Long story short, the reason a thin device doesn't properly get
configured to have a max_hw_sectors_kb of 4 (PAGE_SIZE) is that
thin_io_hints() is blindly copying the queue limits from the thin-pool
device directly to the thin device's queue limits.

Fix this by eliminating thin_io_hints.  Doing so is safe because the
block layer's queue limits stacking already enables the upper level thin
device to inherit the thin-pool device's discard and minimum_io_size and
optimal_io_size limits that get set in pool_io_hints.  But avoiding the
queue limits copy allows the thin and thin-pool limits to be different
where it is important, namely max_hw_sectors_kb.

Reported-by: Daniel Browning <db@kavod.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/dm-thin.c |   13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2766,19 +2766,9 @@ static int thin_iterate_devices(struct d
 	return 0;
 }
 
-/*
- * A thin device always inherits its queue limits from its pool.
- */
-static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
-{
-	struct thin_c *tc = ti->private;
-
-	*limits = bdev_get_queue(tc->pool_dev->bdev)->limits;
-}
-
 static struct target_type thin_target = {
 	.name = "thin",
-	.version = {1, 5, 0},
+	.version = {1, 5, 1},
 	.module	= THIS_MODULE,
 	.ctr = thin_ctr,
 	.dtr = thin_dtr,
@@ -2787,7 +2777,6 @@ static struct target_type thin_target =
 	.postsuspend = thin_postsuspend,
 	.status = thin_status,
 	.iterate_devices = thin_iterate_devices,
-	.io_hints = thin_io_hints,
 };
 
 /*----------------------------------------------------------------*/



  parent reply	other threads:[~2013-02-12 21:08 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-12 20:34 [ 00/61] 3.7.8-stable review Greg Kroah-Hartman
2013-02-12 20:34 ` [ 01/61] rtlwifi: Fix the usage of the wrong variable in usb.c Greg Kroah-Hartman
2013-02-12 20:34 ` [ 02/61] rtlwifi: Fix scheduling while atomic bug Greg Kroah-Hartman
2013-02-12 20:34 ` [ 03/61] regulator: max8998: fix incorrect min_uV value for ldo10 Greg Kroah-Hartman
2013-02-12 20:34 ` [ 04/61] regulator: clear state each invocation of of_regulator_match Greg Kroah-Hartman
2013-02-12 20:34 ` [ 05/61] regulator: s2mps11: fix incorrect register for buck10 Greg Kroah-Hartman
2013-02-12 20:34 ` [ 06/61] IB/qib: Fix for broken sparse warning fix Greg Kroah-Hartman
2013-02-12 20:34 ` [ 07/61] virtio_console: Dont access uninitialized data Greg Kroah-Hartman
2013-02-12 20:34 ` [ 08/61] Bluetooth: Fix handling of unexpected SMP PDUs Greg Kroah-Hartman
2013-02-12 20:34 ` [ 09/61] Revert "iwlwifi: fix the reclaimed packet tracking upon flush queue" Greg Kroah-Hartman
2013-02-12 20:34 ` [ 10/61] can: c_can: Set reserved bit in IFx_MASK2 to 1 on write Greg Kroah-Hartman
2013-02-12 20:34 ` [ 11/61] mwifiex: fix incomplete scan in case of IE parsing error Greg Kroah-Hartman
2013-02-12 20:34 ` [ 12/61] e1000e: enable ECC on I217/I218 to catch packet buffer memory errors Greg Kroah-Hartman
2013-02-12 20:34 ` [ 13/61] media: pwc-if: must check vb2_queue_init() success Greg Kroah-Hartman
2013-02-12 20:34 ` [ 14/61] ath9k_hw: fix calibration issues on chainmask that dont include chain 0 Greg Kroah-Hartman
2013-02-12 20:34 ` [ 15/61] mfd: db8500-prcmu: Fix irqdomain usage Greg Kroah-Hartman
2013-02-12 20:34 ` Greg Kroah-Hartman [this message]
2013-02-12 20:34 ` [ 17/61] net: prevent setting ttl=0 via IP_TTL Greg Kroah-Hartman
2013-02-12 20:34 ` [ 18/61] ipv6: fix the noflags test in addrconf_get_prefix_route Greg Kroah-Hartman
2013-02-12 20:34 ` [ 19/61] net, wireless: overwrite default_ethtool_ops Greg Kroah-Hartman
2013-02-12 20:34 ` [ 20/61] tcp: fix a panic on UP machines in reqsk_fastopen_remove Greg Kroah-Hartman
2013-02-12 20:34 ` [ 21/61] MAINTAINERS: Stephen Hemminger email change Greg Kroah-Hartman
2013-02-12 20:34 ` [ 22/61] ipv6: fix header length calculation in ip6_append_data() Greg Kroah-Hartman
2013-02-12 20:34 ` [ 23/61] macvlan: fix macvlan_get_size() Greg Kroah-Hartman
2013-02-12 20:34 ` [ 24/61] net: calxedaxgmac: throw away overrun frames Greg Kroah-Hartman
2013-02-12 20:34 ` [ 25/61] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode Greg Kroah-Hartman
2013-02-12 20:34 ` [ 26/61] net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaults Greg Kroah-Hartman
2013-02-12 20:34 ` [ 27/61] tcp: fix incorrect LOCKDROPPEDICMPS counter Greg Kroah-Hartman
2013-02-12 20:34   ` Greg Kroah-Hartman
2013-02-12 20:34 ` [ 28/61] isdn/gigaset: fix zero size border case in debug dump Greg Kroah-Hartman
2013-02-12 20:34 ` [ 29/61] netxen: fix off by one bug in netxen_release_tx_buffer() Greg Kroah-Hartman
2013-02-12 20:34 ` [ 30/61] r8169: remove the obsolete and incorrect AMD workaround Greg Kroah-Hartman
2013-02-12 20:34   ` Greg Kroah-Hartman
2013-02-12 20:34 ` [ 31/61] net: loopback: fix a dst refcounting issue Greg Kroah-Hartman
2013-02-12 20:34 ` [ 32/61] IP_GRE: Fix kernel panic in IP_GRE with GRE csum Greg Kroah-Hartman
2013-02-12 20:34 ` [ 33/61] pktgen: correctly handle failures when adding a device Greg Kroah-Hartman
2013-02-12 20:34 ` [ 34/61] ipv6: do not create neighbor entries for local delivery Greg Kroah-Hartman
2013-02-12 20:34 ` [ 35/61] via-rhine: Fix bugs in NAPI support Greg Kroah-Hartman
2013-02-12 20:34 ` [ 36/61] packet: fix leakage of tx_ring memory Greg Kroah-Hartman
2013-02-12 20:34 ` [ 37/61] ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit() Greg Kroah-Hartman
2013-02-12 20:34 ` [ 38/61] atm/iphase: rename fregt_t -> ffreg_t Greg Kroah-Hartman
2013-02-12 20:34 ` [ 39/61] xen/netback: shutdown the ring if it contains garbage Greg Kroah-Hartman
2013-02-12 20:35 ` [ 40/61] xen/netback: dont leak pages on failure in xen_netbk_tx_check_gop Greg Kroah-Hartman
2013-02-12 20:35 ` [ 41/61] xen/netback: free already allocated memory on failure in xen_netbk_get_requests Greg Kroah-Hartman
2013-02-12 20:35 ` [ 42/61] netback: correct netbk_tx_err to handle wrap around Greg Kroah-Hartman
2013-02-12 20:35 ` [ 43/61] ipv4: Remove output route check in ipv4_mtu Greg Kroah-Hartman
2013-02-12 20:35   ` Greg Kroah-Hartman
2013-02-12 20:35 ` [ 44/61] ipv4: Dont update the pmtu on mtu locked routes Greg Kroah-Hartman
2013-02-12 20:35 ` [ 45/61] ipv6: Add an error handler for icmp6 Greg Kroah-Hartman
2013-02-12 20:35 ` [ 46/61] ipv4: Invalidate the socket cached route on pmtu events if possible Greg Kroah-Hartman
2013-02-12 20:35 ` [ 47/61] ipv4: Add a socket release callback for datagram sockets Greg Kroah-Hartman
2013-02-12 20:35 ` [ 48/61] ipv4: Fix route refcount on pmtu discovery Greg Kroah-Hartman
2013-02-12 20:35 ` [ 49/61] sctp: refactor sctp_outq_teardown to insure proper re-initalization Greg Kroah-Hartman
2013-02-12 20:35 ` [ 50/61] net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree Greg Kroah-Hartman
2013-02-12 20:35 ` [ 51/61] net: sctp: sctp_endpoint_free: zero out secret key data Greg Kroah-Hartman
2013-02-12 20:35 ` [ 52/61] tcp: detect SYN/data drop when F-RTO is disabled Greg Kroah-Hartman
2013-02-12 20:35 ` [ 53/61] tcp: fix an infinite loop in tcp_slow_start() Greg Kroah-Hartman
2013-02-12 20:35   ` Greg Kroah-Hartman
2013-02-12 20:35 ` [ 54/61] tcp: frto should not set snd_cwnd to 0 Greg Kroah-Hartman
2013-02-12 20:35   ` Greg Kroah-Hartman
2013-02-12 20:35 ` [ 55/61] tcp: fix for zero packets_in_flight was too broad Greg Kroah-Hartman
2013-02-12 20:35   ` Greg Kroah-Hartman
2013-02-12 20:35 ` [ 56/61] tcp: dont abort splice() after small transfers Greg Kroah-Hartman
2013-02-12 20:35 ` [ 57/61] tcp: splice: fix an infinite loop in tcp_read_sock() Greg Kroah-Hartman
2013-02-12 20:35 ` [ 58/61] tcp: fix splice() and tcp collapsing interaction Greg Kroah-Hartman
2013-02-12 20:35 ` [ 59/61] net: splice: avoid high order page splitting Greg Kroah-Hartman
2013-02-12 20:35 ` [ 60/61] net: splice: fix __splice_segment() Greg Kroah-Hartman
2013-02-12 20:35 ` [ 61/61] drm/nouveau: add lockdep annotations Greg Kroah-Hartman
2013-02-13  3:35   ` Peter Hurley
2013-02-13  9:33     ` Arend van Spriel
2013-02-13  9:43       ` Ben Skeggs
2013-02-13  9:52         ` Arend van Spriel
2013-02-13 17:46         ` Marcin Slusarz
2013-02-13 18:38           ` Marcin Slusarz
2013-02-13  7:06 ` [ 00/61] 3.7.8-stable review Satoru Takeuchi
2013-02-13 15:51 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130212203420.469918899@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=agk@redhat.com \
    --cc=db@kavod.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.