From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Steve Wise" Subject: RE: [PATCH 0/1] Fix for nvme-rdma host crash in nvmf-all.3 Date: Thu, 23 Jun 2016 08:59:56 -0500 Message-ID: <004f01d1cd57$85fc0530$91f40f90$@opengridcomputing.com> References: <576B8FB7.5000305@grimberg.me> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <576B8FB7.5000305-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> Content-Language: en-us Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: 'Sagi Grimberg' , hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-rdma@vger.kernel.org > > > This patch fixes a touch-after-free bug I discovered. It is against > > nvmf-all.3 branch of git://git.infradead.org/nvme-fabrics.git. The patch > > is kind of ugly, so any ideas on a cleaner solution are welcome. > > Hey Steve, I don't see how this bug fixes the root-cause. Not exactly > sure we understand the root-cause. Is it possible that this is a chelsio > specific issue with send completion signaling (like we saw before)? Did > this happen with a non-chelsio device? Due to the stack trace, I believe this is a similar issue we saw before. It is probably chelsio-specific. I don't see it on mlx4. The fix for the previous occurrence of this crash was to signal all FLUSH commands. Do you recall why that fixed it? Perhaps this failure path needs some other signaled command to force the pending unsignaled WRs to be marked "complete" by the driver? Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html