From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joonas Lahtinen Date: Fri, 22 Sep 2017 11:11:03 +0000 Subject: Re: [PATCH][drm-next] drm/i915/gvt: ensure -ve return value is handled correctly Message-Id: <1506078663.6232.65.camel@linux.intel.com> List-Id: References: <20170919155534.25334-1-colin.king@canonical.com> <20170919214614.cfiolgznopouv34e@zhen-hp.sh.intel.com> <1505874923.2067.14.camel@perches.com> <20170920224406.jscthkglwfy3xhtf@zhen-hp.sh.intel.com> <1506004318.5382.4.camel@linux.intel.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Wang, Zhi A" , Zhenyu Wang , Joe Perches Cc: "Gao, Fred" , David Airlie , "intel-gfx@lists.freedesktop.org" , "kernel-janitors@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "Vivi, Rodrigo" , Colin King , "intel-gvt-dev@lists.freedesktop.org" On Thu, 2017-09-21 at 16:17 +0000, Wang, Zhi A wrote: > Hi Joonas: > > Thanks for the introduction. I have been thinking about the > possibility of introducing GEM_BUG_ON into GVT-g recently and > investigating on it. I'm just a bit confused about the usage between > GEM_BUG_ON and WARN_ON. GEM_BUG_ON is basically there to catch things that we do not expect ever to happen within the driver. So we often list the function preconditions as GEM_BUG_ON. It's there for the same reason as the lockdep_assert_held and KASAN. It's sometimes heavy checks that we really want to run when functionally validating kernel. GEM_BUG_ON became to existence because adding checks for obvious conditions at the critical command submission path GEM is not sustainable for performance in production. The expectation is that each GEM_BUG_ON has a testcase in I-G-T that has the potential to hit it if driver was modified not to respect those preconditions. So once our testest passes, we can disable the GEM_BUG_ONs and be confident of the internal driver quality and get the release performance. WARN_ON is mostly used for the cases when the hardware is behaving differently than we expect. We can't remove them as we don't have all the hardware in the world to test, but we try to exercise them too through I-G-Ts. The test will often be the subtest that was written to reproduce the problem with our expectations of hardware in case of hangs and other bugs. After we've corrected the driver behaviour, or got a hardware W/A assigned, we keep the test and add a WARN_ON to make sure there will be no regression back to the same situation. This is at least what should happen, given time constraints, there may be variations. User behaving unexpectedly should never result in WARN_ON (or even worse, BUG_ON), should always just be debug messages displayed (not to trigger the CI) and errors propagated back to user: https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#recommended -ioctl-return-values Bare BUG_ON should only be used when there's the danger of corrupting system memory or filesystems, so from graphics driver, that's not very often. Controlled propagation of errors and maybe WARN_ON is always preferred if possible. > GEM_BUG_ON is only enabled when kernel debug is enabled, which mostly > is disabled in a production kernel. In the case of i915, I'm sure it > will be enabled in CI test so that it can catch broken code path. > Looking into GVT-g, the similar scenario is we enable it in QA test. > > Let's say GEM_BUG_ON can do its work very well in QA test but QA test > is not fully covered all the condition, then something might be still > broken when it comes to the production kernel for user and GEM_BUG_ON > will be disabled and will not catch that, I guess. > > That's my confusion which scratched my mind during the investigation: > If GEM_BUG_ON is not always working, then it looks WARN_ON should > always be used.... Expected to learn more about the story behind. :) So if the saying is some object is "never going to be bigger than 2G", there should be either: 1. GEM_BUG_ON like assertion for it and a test that tries to hit it, by trying to allocate a huge object for example, and should get rejection as -EINVAL 2. Test to see if the object is bigger, and propagate back the error if it is. Either resulting in user reported error if the origin of the object is outside of kernel <-> hardware. Or a WARN_ON if it's strange hardware or kernel driver behavior. You should choose depending on how often your function gets called, and how critical the execution time is. Hopefully this clarified things. Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joonas Lahtinen Subject: Re: [PATCH][drm-next] drm/i915/gvt: ensure -ve return value is handled correctly Date: Fri, 22 Sep 2017 14:11:03 +0300 Message-ID: <1506078663.6232.65.camel@linux.intel.com> References: <20170919155534.25334-1-colin.king@canonical.com> <20170919214614.cfiolgznopouv34e@zhen-hp.sh.intel.com> <1505874923.2067.14.camel@perches.com> <20170920224406.jscthkglwfy3xhtf@zhen-hp.sh.intel.com> <1506004318.5382.4.camel@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: "Wang, Zhi A" , Zhenyu Wang , Joe Perches Cc: "Gao, Fred" , David Airlie , "intel-gfx@lists.freedesktop.org" , "kernel-janitors@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "Vivi, Rodrigo" , Colin King , "intel-gvt-dev@lists.freedesktop.org" List-Id: dri-devel@lists.freedesktop.org T24gVGh1LCAyMDE3LTA5LTIxIGF0IDE2OjE3ICswMDAwLCBXYW5nLCBaaGkgQSB3cm90ZToKPiBI aSBKb29uYXM6Cj4gCj4gVGhhbmtzIGZvciB0aGUgaW50cm9kdWN0aW9uLiBJIGhhdmUgYmVlbiB0 aGlua2luZyBhYm91dCB0aGUKPiBwb3NzaWJpbGl0eSBvZiBpbnRyb2R1Y2luZyBHRU1fQlVHX09O IGludG8gR1ZULWcgcmVjZW50bHkgYW5kCj4gaW52ZXN0aWdhdGluZyBvbiBpdC4gSSdtIGp1c3Qg YSBiaXQgY29uZnVzZWQgYWJvdXQgdGhlIHVzYWdlIGJldHdlZW4KPiBHRU1fQlVHX09OIGFuZCBX QVJOX09OLgoKR0VNX0JVR19PTiBpcyBiYXNpY2FsbHkgdGhlcmUgdG8gY2F0Y2ggdGhpbmdzIHRo YXQgd2UgZG8gbm90IGV4cGVjdApldmVyIHRvIGhhcHBlbiB3aXRoaW4gdGhlIGRyaXZlci4gU28g d2Ugb2Z0ZW4gbGlzdCB0aGUgZnVuY3Rpb24KcHJlY29uZGl0aW9ucyBhcyBHRU1fQlVHX09OLiBJ dCdzIHRoZXJlIGZvciB0aGUgc2FtZSByZWFzb24gYXMgdGhlCmxvY2tkZXBfYXNzZXJ0X2hlbGQg YW5kIEtBU0FOLiBJdCdzIHNvbWV0aW1lcyBoZWF2eSBjaGVja3MgdGhhdCB3ZQpyZWFsbHkgd2Fu dCB0byBydW4gd2hlbiBmdW5jdGlvbmFsbHkgdmFsaWRhdGluZyBrZXJuZWwuCgpHRU1fQlVHX09O IGJlY2FtZSB0byBleGlzdGVuY2UgYmVjYXVzZSBhZGRpbmcgY2hlY2tzIGZvciBvYnZpb3VzCmNv bmRpdGlvbnMgYXQgdGhlIGNyaXRpY2FsIGNvbW1hbmQgc3VibWlzc2lvbiBwYXRoIEdFTSBpcyBu b3QKc3VzdGFpbmFibGUgZm9yIHBlcmZvcm1hbmNlIGluIHByb2R1Y3Rpb24uCgpUaGUgZXhwZWN0 YXRpb24gaXMgdGhhdCBlYWNoIEdFTV9CVUdfT04gaGFzIGEgdGVzdGNhc2UgaW4gSS1HLVQgdGhh dApoYXMgdGhlIHBvdGVudGlhbCB0byBoaXQgaXQgaWYgZHJpdmVyIHdhcyBtb2RpZmllZCBub3Qg dG8gcmVzcGVjdCB0aG9zZQpwcmVjb25kaXRpb25zLiBTbyBvbmNlIG91ciB0ZXN0ZXN0IHBhc3Nl cywgd2UgY2FuIGRpc2FibGUgdGhlCkdFTV9CVUdfT05zIGFuZCBiZSBjb25maWRlbnQgb2YgdGhl IGludGVybmFsIGRyaXZlciBxdWFsaXR5IGFuZCBnZXQgdGhlCnJlbGVhc2UgcGVyZm9ybWFuY2Uu CgpXQVJOX09OIGlzIG1vc3RseSB1c2VkIGZvciB0aGUgY2FzZXMgd2hlbiB0aGUgaGFyZHdhcmUg aXMgYmVoYXZpbmcKZGlmZmVyZW50bHkgdGhhbiB3ZSBleHBlY3QuIFdlIGNhbid0IHJlbW92ZSB0 aGVtIGFzIHdlIGRvbid0IGhhdmUgYWxsCnRoZSBoYXJkd2FyZSBpbiB0aGUgd29ybGQgdG8gdGVz dCwgYnV0IHdlIHRyeSB0byBleGVyY2lzZSB0aGVtIHRvbwp0aHJvdWdoIEktRy1Ucy4gVGhlIHRl c3Qgd2lsbCBvZnRlbiBiZSB0aGUgc3VidGVzdCB0aGF0IHdhcyB3cml0dGVuIHRvCnJlcHJvZHVj ZSB0aGUgcHJvYmxlbSB3aXRoIG91ciBleHBlY3RhdGlvbnMgb2YgaGFyZHdhcmUgaW4gY2FzZSBv ZgpoYW5ncyBhbmQgb3RoZXIgYnVncy4gQWZ0ZXIgd2UndmUgY29ycmVjdGVkIHRoZSBkcml2ZXIg YmVoYXZpb3VyLCBvcgpnb3QgYSBoYXJkd2FyZSBXL0EgYXNzaWduZWQsIHdlIGtlZXAgdGhlIHRl c3QgYW5kIGFkZCBhIFdBUk5fT04gdG8gbWFrZQpzdXJlIHRoZXJlIHdpbGwgYmUgbm8gcmVncmVz c2lvbiBiYWNrIHRvIHRoZSBzYW1lIHNpdHVhdGlvbi4KClRoaXMgaXMgYXQgbGVhc3Qgd2hhdCBz aG91bGQgaGFwcGVuLCBnaXZlbiB0aW1lIGNvbnN0cmFpbnRzLCB0aGVyZSBtYXkKYmUgdmFyaWF0 aW9ucy4KClVzZXIgYmVoYXZpbmcgdW5leHBlY3RlZGx5IHNob3VsZCBuZXZlciByZXN1bHQgaW4g V0FSTl9PTiAob3IgZXZlbgp3b3JzZSwgQlVHX09OKSwgc2hvdWxkIGFsd2F5cyBqdXN0IGJlIGRl YnVnIG1lc3NhZ2VzIGRpc3BsYXllZCAobm90IHRvCnRyaWdnZXIgdGhlIENJKSBhbmQgZXJyb3Jz IHByb3BhZ2F0ZWQgYmFjayB0byB1c2VyOgoKaHR0cHM6Ly8wMS5vcmcvbGludXhncmFwaGljcy9n ZngtZG9jcy9kcm0vZ3B1L2RybS11YXBpLmh0bWwjcmVjb21tZW5kZWQKLWlvY3RsLXJldHVybi12 YWx1ZXMKCkJhcmUgQlVHX09OIHNob3VsZCBvbmx5IGJlIHVzZWQgd2hlbiB0aGVyZSdzIHRoZSBk YW5nZXIgb2YgY29ycnVwdGluZwpzeXN0ZW0gbWVtb3J5IG9yIGZpbGVzeXN0ZW1zLCBzbyBmcm9t IGdyYXBoaWNzIGRyaXZlciwgdGhhdCdzIG5vdCB2ZXJ5Cm9mdGVuLiBDb250cm9sbGVkIHByb3Bh Z2F0aW9uIG9mIGVycm9ycyBhbmQgbWF5YmUgV0FSTl9PTiBpcyBhbHdheXMKcHJlZmVycmVkIGlm IHBvc3NpYmxlLgoKCj4gR0VNX0JVR19PTiBpcyBvbmx5IGVuYWJsZWQgd2hlbiBrZXJuZWwgZGVi dWcgaXMgZW5hYmxlZCwgd2hpY2ggbW9zdGx5Cj4gaXMgZGlzYWJsZWQgaW4gYSBwcm9kdWN0aW9u IGtlcm5lbC4gSW4gdGhlIGNhc2Ugb2YgaTkxNSwgSSdtIHN1cmUgaXQKPiB3aWxsIGJlIGVuYWJs ZWQgaW4gQ0kgdGVzdCBzbyB0aGF0IGl0IGNhbiBjYXRjaCBicm9rZW4gY29kZSBwYXRoLgo+IExv b2tpbmcgaW50byBHVlQtZywgdGhlIHNpbWlsYXIgc2NlbmFyaW8gaXMgd2UgZW5hYmxlIGl0IGlu IFFBIHRlc3QuCj4gCj4gTGV0J3Mgc2F5IEdFTV9CVUdfT04gY2FuIGRvIGl0cyB3b3JrIHZlcnkg d2VsbCBpbiBRQSB0ZXN0IGJ1dCBRQSB0ZXN0Cj4gaXMgbm90IGZ1bGx5IGNvdmVyZWQgYWxsIHRo ZSBjb25kaXRpb24sIHRoZW4gc29tZXRoaW5nIG1pZ2h0IGJlIHN0aWxsCj4gYnJva2VuIHdoZW4g aXQgY29tZXMgdG8gdGhlIHByb2R1Y3Rpb24ga2VybmVsIGZvciB1c2VyIGFuZCBHRU1fQlVHX09O Cj4gd2lsbCBiZSBkaXNhYmxlZCBhbmQgd2lsbCBub3QgY2F0Y2ggdGhhdCwgSSBndWVzcy4KPiAK PiBUaGF0J3MgbXkgY29uZnVzaW9uIHdoaWNoIHNjcmF0Y2hlZCBteSBtaW5kIGR1cmluZyB0aGUg aW52ZXN0aWdhdGlvbjoKPiBJZiBHRU1fQlVHX09OIGlzIG5vdCBhbHdheXMgd29ya2luZywgdGhl biBpdCBsb29rcyBXQVJOX09OIHNob3VsZAo+IGFsd2F5cyBiZSB1c2VkLi4uLiBFeHBlY3RlZCB0 byBsZWFybiBtb3JlIGFib3V0IHRoZSBzdG9yeSBiZWhpbmQuIDopCgpTbyBpZiB0aGUgc2F5aW5n IGlzIHNvbWUgb2JqZWN0IGlzICJuZXZlciBnb2luZyB0byBiZSBiaWdnZXIgdGhhbiAyRyIsCnRo ZXJlIHNob3VsZCBiZSBlaXRoZXI6CgoxLiBHRU1fQlVHX09OIGxpa2UgYXNzZXJ0aW9uIGZvciBp dCBhbmQgYSB0ZXN0IHRoYXQgdHJpZXMgdG8gaGl0IGl0LCBieQp0cnlpbmcgdG8gYWxsb2NhdGUg YSBodWdlIG9iamVjdCBmb3IgZXhhbXBsZSwgYW5kIHNob3VsZCBnZXQgcmVqZWN0aW9uCmFzIC1F SU5WQUwKCjIuIFRlc3QgdG8gc2VlIGlmIHRoZSBvYmplY3QgaXMgYmlnZ2VyLCBhbmQgcHJvcGFn YXRlIGJhY2sgdGhlIGVycm9yIGlmCml0IGlzLiBFaXRoZXIgcmVzdWx0aW5nIGluIHVzZXIgcmVw b3J0ZWQgZXJyb3IgaWYgdGhlIG9yaWdpbiBvZiB0aGUKb2JqZWN0IGlzIG91dHNpZGUgb2Yga2Vy bmVsIDwtPiBoYXJkd2FyZS4gT3IgYSBXQVJOX09OIGlmIGl0J3Mgc3RyYW5nZQpoYXJkd2FyZSBv ciBrZXJuZWwgZHJpdmVyIGJlaGF2aW9yLgoKWW91IHNob3VsZCBjaG9vc2UgZGVwZW5kaW5nIG9u IGhvdyBvZnRlbiB5b3VyIGZ1bmN0aW9uIGdldHMgY2FsbGVkLCBhbmQKaG93IGNyaXRpY2FsIHRo ZSBleGVjdXRpb24gdGltZSBpcy4KCkhvcGVmdWxseSB0aGlzIGNsYXJpZmllZCB0aGluZ3MuCgpS ZWdhcmRzLCBKb29uYXMKLS0gCkpvb25hcyBMYWh0aW5lbgpPcGVuIFNvdXJjZSBUZWNobm9sb2d5 IENlbnRlcgpJbnRlbCBDb3Jwb3JhdGlvbgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fXwpJbnRlbC1nZnggbWFpbGluZyBsaXN0CkludGVsLWdmeEBsaXN0cy5m cmVlZGVza3RvcC5vcmcKaHR0cHM6Ly9saXN0cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0 aW5mby9pbnRlbC1nZngK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752131AbdIVLLY (ORCPT ); Fri, 22 Sep 2017 07:11:24 -0400 Received: from mga14.intel.com ([192.55.52.115]:1569 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751977AbdIVLLW (ORCPT ); Fri, 22 Sep 2017 07:11:22 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.42,427,1500966000"; d="scan'208";a="1017319888" Message-ID: <1506078663.6232.65.camel@linux.intel.com> Subject: Re: [PATCH][drm-next] drm/i915/gvt: ensure -ve return value is handled correctly From: Joonas Lahtinen To: "Wang, Zhi A" , Zhenyu Wang , Joe Perches Cc: "Gao, Fred" , David Airlie , "intel-gfx@lists.freedesktop.org" , "kernel-janitors@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Jani Nikula , "dri-devel@lists.freedesktop.org" , "Vivi, Rodrigo" , Colin King , "intel-gvt-dev@lists.freedesktop.org" Date: Fri, 22 Sep 2017 14:11:03 +0300 In-Reply-To: References: <20170919155534.25334-1-colin.king@canonical.com> <20170919214614.cfiolgznopouv34e@zhen-hp.sh.intel.com> <1505874923.2067.14.camel@perches.com> <20170920224406.jscthkglwfy3xhtf@zhen-hp.sh.intel.com> <1506004318.5382.4.camel@linux.intel.com> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.24.5 (3.24.5-1.fc26) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2017-09-21 at 16:17 +0000, Wang, Zhi A wrote: > Hi Joonas: > > Thanks for the introduction. I have been thinking about the > possibility of introducing GEM_BUG_ON into GVT-g recently and > investigating on it. I'm just a bit confused about the usage between > GEM_BUG_ON and WARN_ON. GEM_BUG_ON is basically there to catch things that we do not expect ever to happen within the driver. So we often list the function preconditions as GEM_BUG_ON. It's there for the same reason as the lockdep_assert_held and KASAN. It's sometimes heavy checks that we really want to run when functionally validating kernel. GEM_BUG_ON became to existence because adding checks for obvious conditions at the critical command submission path GEM is not sustainable for performance in production. The expectation is that each GEM_BUG_ON has a testcase in I-G-T that has the potential to hit it if driver was modified not to respect those preconditions. So once our testest passes, we can disable the GEM_BUG_ONs and be confident of the internal driver quality and get the release performance. WARN_ON is mostly used for the cases when the hardware is behaving differently than we expect. We can't remove them as we don't have all the hardware in the world to test, but we try to exercise them too through I-G-Ts. The test will often be the subtest that was written to reproduce the problem with our expectations of hardware in case of hangs and other bugs. After we've corrected the driver behaviour, or got a hardware W/A assigned, we keep the test and add a WARN_ON to make sure there will be no regression back to the same situation. This is at least what should happen, given time constraints, there may be variations. User behaving unexpectedly should never result in WARN_ON (or even worse, BUG_ON), should always just be debug messages displayed (not to trigger the CI) and errors propagated back to user: https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#recommended -ioctl-return-values Bare BUG_ON should only be used when there's the danger of corrupting system memory or filesystems, so from graphics driver, that's not very often. Controlled propagation of errors and maybe WARN_ON is always preferred if possible. > GEM_BUG_ON is only enabled when kernel debug is enabled, which mostly > is disabled in a production kernel. In the case of i915, I'm sure it > will be enabled in CI test so that it can catch broken code path. > Looking into GVT-g, the similar scenario is we enable it in QA test. > > Let's say GEM_BUG_ON can do its work very well in QA test but QA test > is not fully covered all the condition, then something might be still > broken when it comes to the production kernel for user and GEM_BUG_ON > will be disabled and will not catch that, I guess. > > That's my confusion which scratched my mind during the investigation: > If GEM_BUG_ON is not always working, then it looks WARN_ON should > always be used.... Expected to learn more about the story behind. :) So if the saying is some object is "never going to be bigger than 2G", there should be either: 1. GEM_BUG_ON like assertion for it and a test that tries to hit it, by trying to allocate a huge object for example, and should get rejection as -EINVAL 2. Test to see if the object is bigger, and propagate back the error if it is. Either resulting in user reported error if the origin of the object is outside of kernel <-> hardware. Or a WARN_ON if it's strange hardware or kernel driver behavior. You should choose depending on how often your function gets called, and how critical the execution time is. Hopefully this clarified things. Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation