From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maxime Ripard Subject: Re: Possible RAID6 regression with ASYNC_TX_DMA enabled in 4.1 Date: Tue, 12 May 2015 14:55:46 +0200 Message-ID: <20150512125546.GJ10961@lukather> References: <20150507125702.GI11057@lukather> <20150511062638.GA63893@kernel.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="CxDuMX1Cv2n9FQfo" Return-path: Content-Disposition: inline In-Reply-To: <20150511062638.GA63893@kernel.org> Sender: linux-kernel-owner@vger.kernel.org To: Shaohua Li Cc: Neil Brown , linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, Lior Amsalem , Thomas Petazzoni , Gregory Clement , Boris Brezillon List-Id: linux-raid.ids --CxDuMX1Cv2n9FQfo Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Shaohua, On Sun, May 10, 2015 at 11:26:38PM -0700, Shaohua Li wrote: > On Thu, May 07, 2015 at 02:57:02PM +0200, Maxime Ripard wrote: > > Hi, > >=20 > > I'm currently trying to add support for the PQ operations on the > > marvell XOR engine, in dmaengine, obviously to be able to use async_tx > > to offload these operations. > >=20 > > I'm testing these patches with a RAID6 array with 4 disks. > >=20 > > However, since the commit 59fc630b8b5f ("RAID5: batch adjacent full > > stripe write", every write to that array fails with the following > > stacktrace. > >=20 > > http://code.bulix.org/eh8iew-88342?raw > >=20 > > It seems to be generated by that warning here: > >=20 > > http://lxr.free-electrons.com/source/crypto/async_tx/async_tx.c#L173 > >=20 > > And indeed, if we dump the status of depend_tx here, it's already been > > acked. > >=20 > > That doesn't happen if ASYNC_TX_DMA is disabled, hence using the > > software version of it, instead of relying on our XOR engine. It > > doesn't happen on any commit prior to the one mentionned above, with > > the exact same changes applied. These changes are meant to be > > contributed, so I can definitely push them somewhere if needed. > >=20 > > I don't really know where to look for though, the change that is > > causing this is probably the change in ops_run_reconstruct6, but I'm > > not sure that this partial revert alone would work with regard to the > > rest of the patch. >=20 > I don't have a machine with dmaengine, it's likely there is error in this= side. > Could you please make stripe_can_batch() returns false always and check i= f the > error disappear? This should narrow down if it's related to batch issue. The error indeed disappears if stripe_can_batch always returns false. Maxime --=20 Maxime Ripard, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com --CxDuMX1Cv2n9FQfo Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVUfhSAAoJEBx+YmzsjxAgna8QALYcdQxIR2cZ7MljZQ2KUUSI Wr0DpxJedvHlx0sCeETM3imPDb7Qspqh4Q9AZ2YeDLfJMDgwGJploRGf5Jthae6o Th5EIjnzsp1gt9150ug7xCEBjCPkYkdMCeOewxvNF9GyDotEDbLgh+oPU8AUtafU AUt3NiNhBybd9ljiIX7jhUTJM2B39/SxmH5gk7P8IJeaEfcs98YwcKpERbAO6pht tHus5vf/aAvse1X7N19ZUExhOoRRVOlqP/MbEEdF7B+hLmWD/nYCBweBq02nUW8O UyeuJG3D23EnMvS3VQ0kXQwsJ9C/9cXin4esEdg8HbHYxEQktmY/Gpwk12dRxw4x PU6AxWifW2W3cpblCrDEs7sMm60dn/qqKdXKzeI52WIX/T2Xkk2LcleN+L7DQGv5 PaZ5GFP8PgNHLN6XuReIT4jm3j8zNpUw8spYZMh61c7TJQCW3LicmvcZai+Koi6x osgpBeqgdV1J9UaZZhGS89yH0cuRnJkcagseJPPXe9BluYBSWXFkTlKWGSuNmm4R FkamFuuflYG0V3XUZ1myiz5gZBPDStXZF70TdRiS68a3nFgJr7bja4lROYZmxV7P rT5CzdWw2FDpH03/J362aijEA7QSwwMDboEvpQ8MXDxpU2D6SHo/dojjACcRCNUj p7G+GxqR1pO+CmQWeavO =zscm -----END PGP SIGNATURE----- --CxDuMX1Cv2n9FQfo--