From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C37DC742BD for ; Fri, 12 Jul 2019 14:35:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 71A7D206B8 for ; Fri, 12 Jul 2019 14:35:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="Khf83+1e" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727108AbfGLOfs (ORCPT ); Fri, 12 Jul 2019 10:35:48 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:36647 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727072AbfGLOfs (ORCPT ); Fri, 12 Jul 2019 10:35:48 -0400 Received: by mail-qt1-f195.google.com with SMTP id z4so8295535qtc.3 for ; Fri, 12 Jul 2019 07:35:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=XuyXDtAvolBfZ+rbhmoC0vapYY2fhXsMaEPS8JgdSzk=; b=Khf83+1erPS4Bjk1Inb9QANWh2rghFCU9j+dUFqCDVP2ogWB80iifQGquMjNx20qRV OKG0dCe07T8gHi5MVAudRToPT+9bdbgfoxCHkR9YXKi9sj6AakCF+OSUqRj/rLWAPLMv CckRDI6fX81mdULOiMz9i2Ux2CiezKcwFhz6ktOPA0s1GPAHvWRoIPDrZ2M0Qq2NtI0W pZU5jK9D7wiCaeo9xFaaGU5Bnh7WUEVv3U2JBjMvLNcPRLVNDNb6i3wrOCRTI1sfPwrh xlOSDPDoxpkRyxIUpDFU5wj+7aJc+kgiOVyMtTI+EYsP9zNSgJxe0QlZuie0o+b5qX3p gcmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=XuyXDtAvolBfZ+rbhmoC0vapYY2fhXsMaEPS8JgdSzk=; b=f+axpQ7hM6uT5AcK/Zuz4REoaDY/YEIocmHx8P2knrZtFlT0ieAb22hf9+YdExZ0CR eYSXxZOTnLfFuMvkowidzaCAuetCmt37CUPCIbxR0ZmXTSyo+2ff8bwJx6tkUqzU9EOn njVU6RTnmRqpCIGm1sXJ2ATEESqg6uWuJgFxyBSq2X8BX+QZbroJRAsJl4Vsjh/mFObY 22YDzjYaKhD5feZuHtCsriwefNR1MjMJrpnWDzVjR8smrZBiX0pE4mN4QfMQmReqUR8A YwAODYCplTUBVckFqtpOyAnVGN9TX5moWgrrUodLVLiz3tYe6DTnHRKdZENd6p/L/z0t 2Mzw== X-Gm-Message-State: APjAAAUAV250kaTJzObmHJTem4sgKK4T/MIWmnfuCcE9cBs48b4LzQOK SmNc+sMLfNDJ9+xf+XpEnAN8sUoztuhyzg== X-Google-Smtp-Source: APXvYqzlvld4ipwPMMCTDr7nJPPuKWvHx+SabHZi/o5em9CyYWesfnKG/zJTRw/t+Z5/xz/1cTYPZQ== X-Received: by 2002:a0c:ea34:: with SMTP id t20mr6882497qvp.11.1562942147689; Fri, 12 Jul 2019 07:35:47 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-156-34-55-100.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.55.100]) by smtp.gmail.com with ESMTPSA id n18sm3514439qtr.28.2019.07.12.07.35.47 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 12 Jul 2019 07:35:47 -0700 (PDT) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1hlweI-0001oo-N9; Fri, 12 Jul 2019 11:35:46 -0300 Date: Fri, 12 Jul 2019 11:35:46 -0300 From: Jason Gunthorpe To: Potnuri Bharat Teja Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com, monis@mellanox.com, nirranjan@chelsio.com Subject: Re: User SIW fails matching device Message-ID: <20190712143546.GD27512@ziepe.ca> References: <20190712142718.GA26697@chelsio.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190712142718.GA26697@chelsio.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On Fri, Jul 12, 2019 at 07:57:19PM +0530, Potnuri Bharat Teja wrote: > Hi all, > I observe the following behavior on one of my machines configured for siw. > > Issue: > SIW device gets wrong device ops (HW/real rdma driver device ops) instead of > siw device ops due to improper device matching. > > Root-cause: > In libibverbs, during user cma initialisation, for each entry from the driver > list, sysfs device is checked for matching name or device. > If the siw/rxe driver is at the head of the list, then sysfs device matches > properly with the corresponding siw driver and gets the corresponding siw/rxe > device ops. Now, If the siw/rxe driver is after the real HW driver cxgb4/mlx5 > respectively in the driver list, then siw sysfs device matches pci device and > wrongly gets the device ops of HW driver (cxgb4/mlx5). > > Below debug prints from verbs_register_driver() and driver_list entries, where > siw is after cxgb4. I see verbs alloc context landing in cxgb4_alloc_context > instead of siw_alloc_context, thus breaking user siw. > > verbs_register_driver_22: 184: driver 0x176e370 > verbs_register_driver_22: 185: name ipathverbs > verbs_register_driver_22: 184: driver 0x176f6a0 > verbs_register_driver_22: 185: name cxgb4 > verbs_register_driver_22: 184: driver 0x176fd50 > verbs_register_driver_22: 185: name cxgb3 > verbs_register_driver_22: 184: driver 0x1777020 > verbs_register_driver_22: 185: name rxe > verbs_register_driver_22: 184: driver 0x1770a30 > verbs_register_driver_22: 185: name siw > verbs_register_driver_22: 184: driver 0x1771120 > verbs_register_driver_22: 185: name mlx4 > verbs_register_driver_22: 184: driver 0x1771990 > verbs_register_driver_22: 185: name mlx5 > verbs_register_driver_22: 184: driver 0x1771ff0 > verbs_register_driver_22: 185: name efa > > try_drivers: 372: driver 0x176e370, sysfs_dev 0x1776b20, name: ipathverbs > try_drivers: 372: driver 0x176f6a0, sysfs_dev 0x1776b20, name: cxgb4 > try_drivers: 372: driver 0x176fd50, sysfs_dev 0x1776b20, name: cxgb3 > try_drivers: 372: driver 0x1777020, sysfs_dev 0x1776b20, name: rxe > try_drivers: 372: driver 0x1770a30, sysfs_dev 0x1776b20, name: siw > try_drivers: 372: driver 0x1771120, sysfs_dev 0x1776b20, name: mlx4 > try_drivers: 372: driver 0x1771990, sysfs_dev 0x1776b20, name: mlx5 > try_drivers: 372: driver 0x1771ff0, sysfs_dev 0x1776b20, name: efa > > Proposed fix: > I have the below fix that works. It adds siw/rxe driver to the HEAD of the > driver list and the rest to the tail. I am not sure if this fix is the ideal > one, so I am attaching it to this mail. Update your rdma-core to latest and this will be fixed fully by using netlink to match the siw device.. Jason