Quoting Lucas De Marchi (2024-02-09 03:01:26-03:00)On Thu, Feb 08, 2024 at 04:29:55PM -0800, Daniele Ceraolo Spurio wrote:On 2/7/2024 12:40 PM, Lucas De Marchi wrote:On Wed, Feb 07, 2024 at 10:34:07AM -0800, Daniele Ceraolo Spurio wrote:On 2/7/2024 8:42 AM, Lucas De Marchi wrote:+Gustavo who is dealing with DMC firmware latelyHey, guys. Sorry for being so late for the party...On Wed, Feb 07, 2024 at 03:30:59AM +0000, Matthew Brost wrote:On Tue, Feb 06, 2024 at 05:18:50PM -0800, John Harrison wrote:On 2/6/2024 15:41, Daniele Ceraolo Spurio wrote:Supporting older GuC versions comes with baggage, both on the coding side (due to interfaces only being available from a certain version onwards) and on the testing side (due to having to make surethe driverworks as expected with older GuCs). Since all of our Xe platform are still under force probe, we haven't committed to support any specific GuC version and we therefore don't need to support the older once, which means that we can forcea bottomlimit to what GuC we accept. This allows us to remove any conditional statements based on older GuC versions and also to approach newer additions knowing that we'll never attempt to load something older than our minimum requirement. RFC: this patch sets the minimum to the current GuC version (70.19), but that can be moved one way or the other. The main aim here isIdeally, this would be bumped every time we update Xe to a newer firmware version right up to the point when force probe is lifted. At that point it becomes fixed and we have to add the version check support back in for future w/a's and features. Get's my vote :).Yeah, but see my other reply... I think we will have to wait the firmware being available in linux-firmware for that. Also, let's kickstart a discussion on our process with some possible changes so we can get it documented. I think we have a good opportunity here to start adopting the https://gitlab.freedesktop.org/drm/firmware repo. Rough idea: 1) use intel-staging branch with tags for pull requests to linux-firmware, just like documented in their readme. IMO the naming is rather unfortunate since it would be good to use it for (2) below.... but since it's already usedAgreed. The <vendor>-staging branches, as proposed in the README, look more like some sort of <vendor>-next type of branch. I think it would be good if vendors could have their own `<vendor>/*` (or `<vendor>-*`) namespace for branch names so that, while having some common conventions, they could also adapt parts of it to their needs. For example, we could have branches like intel/next for (1) and intel/ci for (2). Not sure how easy it is to do this now, though.we can use something else. this would mainly replace the use we have today for https://cgit.freedesktop.org/drm/drm-firmware/ , which could be retired. From upstream linux-firmware pov the only change would be the remote location and that we start using tags for the pull requests, coming from a single branch regardless of the firmware (guc, huc, dmc, gsc): intel-staging. Once accepted in linux-firmware, the branch is fast-forwarded.I think this needs a bit more fleshing out, because before we do a pull request, we do want to run CI on the blobs. Also, in several occasions we went through a couple of versions before we closed on what to push to linux-firmware (e.g. in the latest push we started with 70.19.1 but then pushed 70.19.2), so we can't go to intel-staging until we're actually ready to push. I think the process you have below for mmp blobs should work for this early testing flow as well, but we might end up with a lot of noise in the staging-intel-for-CI branch.that would be a throw away branch where we push stuff to be able to test on CI. I don't think the commit history matters much there. The factThat depends on how CI does things. With the current handling of throwaway branches we have on drm-firmware, a CI request can accidentally roll back another one. e.g., if we push a throwaway branch with a GuC update and then another with a DMC update, the second push will roll-back the GuC to what's on the new branch (likely the linux-firmware version). That's why there was a suggestion ti use a unified branch for CI as well.not sure we are talking about the same thing. It is a unified branch for CI: staging-intel-for-CI is where the mmp + about-to-be-upstreamed-for-the-first-time firmware blobs are added, regardless if it's guc, dmc, huc, etc. IMO it's much simpler since CI basically has to take the additional firmware from this 1 branch. No risk of rolling back another firmware because of the new one.Maybe we are having some confusion here because of the term "throwaway" for intel-staging-for-ci? We use throwaway branches for the current process, but I guess intel-staging-for-ci would not really be a "throwaway" branch per se and a unified one (as already mentioned above). I think using intel-staging-for-ci will be okay if teams take the care of only adding/updating/removing blobs they are responsible for.
that the firmware is available to match what is in the kernel and that there's a documented process for using it in my view trumps the this downside. what I expect would be, considering the LNL case as example: 1) Start testing with the mmp version: a) Add firmware to drm/firmware intel-staging-for-CI b) Add commit in topic/xe-for-CI on the kernel side to make use of that firmwareHow would we check for CI after (b)? For DMC, I have been doing something similar. Differences are: * for (a), I am using an intel-ci branch on drm/drm-firmware and send a pull request to intel-gfx so that CI makes the mmp blobs available; * for (b), I send a "[CI]"-tagged patch to intel-gfx making the kernel explicitly use that fully versioned blob path. One advantage here is that I keep a broken DMC release from causing CI noise on existing unrelated patch series.2) Ooops, that has bugs a) add a second mmp firmware to drm/firmware intel-staging-for-CI b) replace commit in topic/xe-for-CI on the kernel side 3) we think we are good, let's try for real a) Add lnl_guc_70.bin to drm/firmware intel-staging-for-CI b) replace commit in topic/xe-for-CI on the kernel side 4) yay, it worked a) Add that lnl_gu_70.bin firmware to intel-staging branch and prepare pull request to linux-firmware b) move patch from topic/xe-for-CI to drm-xe-next: i.e., rebase topic/xe-for-CI on top of drm-xe-next leaving that commit as first one. git push topic/xe-for-CI, dim push drm-xe-next (or implement the logic in dim to push 2 branches) We may need some time between (a) and (b) depending on where we are on the kernel release cycle: we don't want to submit a kernel pull request before the firmware is available @ linux-firmware repo. Note that the fact we are using mmp makes it more complex, although explicit. Going direct with lnl_gu_70.bin would also work and avoid updating the commits on the kernel side.This works for a completely new release. For updating an existing release, we'll have to push, potentially multiple times, all the *_guc_70.bin binaries to intel-staging-for-CI. Just to be clear, I have nothing against this, just noting that it would generate a lot of noise in that branch and potentially use a lot of space on disk.We also need some rules to handle the case where there is already a PR in flight and we need to push some more blobs. This might be as easy as the committer seeing that there are commits on top of master, replying to the previous PR to deprecate it, and then generating a new PR with all the blobs.the pull requests to linux-firmware would come from tags, not a branch. So you have (tip of the branch is on top): o <intel-staging> intel: Add lnl_guc_70.bin o <refs/tags/intel-2024-01-30> intel: Update dg2_guc_70.bin <-- last in flight pull request o intel: Add lnl_dmc.bin o <origin/main> .... <-- where linux-firmware is at Looking at amd-staging, it seems to match what they are doing: https://gitlab.freedesktop.org/drm/firmware/-/commits/amd-staging?ref_type=heads see the amd-$DATE tagsSorry I wasn't very clear in my comment, what I wanted to point out was that if we are on a unified branch and we have the PR against a specific tag (intel-2024-01-30 in your example) already in flight, how do we generate a new PR for the newer commit that comes after the tag (and which will have its own new tag)? Does git do some tag magic and handle it for us, or do we need to generate a new PR that supersedes the one in flight?humn... there is no magic, the old tag is an ascendent path of the new one. But as I said, just coordinating with the few people updating firmware who/when will do the pull request should be sufficient for avoiding a pull request when there's already another one in flight.While I see the benefit of having a unified intel-staging-for-ci, I think I'm failing to see much benefit of having a unified intel-staging here. Wouldn't it be better if pull request were independent of each other? If we had an intel/* (or intel-*) branching namespace, we could keep using throwaway branches for the pull requests and have the discipline of removing them when not needed anymore. -- Gustavo Sousa2) mmp firmware versions are only ever pushed to a separate staging-intel-for-CI branch. There is no pull request in the mailing for this. We can either push directly to the branch or create MRs in gitlab. CI would start using this branch for the extra firmware for platforms instead of whatever it's using today to process the pull requests from the mailing list. Or whatever it's using, because I don't know and don't see it documented anywhere.As long as the CI team is ok with this, I'm all for it.The patch on the kernel side to use the mmp firmware is only ever pushed to the topic/xe-for-CI branch since a) the firmware is coming from a non-official location and b) end users and distro packaging shouldn't see a warning when building the kernel due to a possibly missing firmware 3) Raising firmware version requirement for past platforms used as SDV can be done **unless** it raises the major version. That's because end users would start seeing the warning that we avoided in (2).Who are the end users here? If we're talking about older non-officially supported platforms, the only users should be developers and they should be able to handle having to update the firmwares to a newer major versions.distros and any developer outside Intel. The kernel build system is unaware of xe.force_probe. So if you have, after the several macros: MODULE_FIRMWARE("xe/tgl_guc_71.bin") It will show up in `modinfo -f firmware xe`. And it will show as a warning when installing/packaging a kernel. It doesn't matter for minor/patch updates because the file name is major-only and **running** with that module is protected by the force_probe. The major may be updated when it's available in linux-firmware, which means i915 started using it (for i915 that would be "as an option, with fallback to the previous major release" of course).Ok I get the concern. My assumption here was that we'd only update the minimum required version if that version was in linux-firmware even for minor updates, hence why I didn't see why a major update would be different. I guess we could go with a more relaxed approach where we allow the required minor to be updated for force-probe platforms as long as the firmware is available on a public/CI branch even if it is not in linux-firmware.yep, I don't see it causing issues to end users.Getting back on track with the original purpose of this patch, are you ok with setting the minimum to 70.19 if I first push the matching PVC 70.19 binary (via the old method for now), while we continue sorting out how to manage the new repo?yes. Lucas De MarchiDanieleLucas De MarchiDanielethoughts? Lucas De MarchiMine too. With that: Acked-by: Matthew Brost <matthew.brost@intel.com>John.agreeing to stop supporting very old GuC releases on the newerdriver.Signed-off-by: Daniele Ceraolo Spurio<daniele.ceraolospurio@intel.com>Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> --- drivers/gpu/drm/xe/xe_guc.c | 14 ++------------ drivers/gpu/drm/xe/xe_uc_fw.c | 36++++++++++++++---------------------2 files changed, 16 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_guc.cb/drivers/gpu/drm/xe/xe_guc.cindex 868208a39829..5e6b27aac495 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -132,15 +132,10 @@ static u32 guc_ctl_ads_flags(structxe_guc *guc)return flags; } -#define GUC_VER(maj, min, pat) (((maj) << 16) | ((min) <<8) | (pat))- static u32 guc_ctl_wa_flags(struct xe_guc *guc) { struct xe_device *xe = guc_to_xe(guc); struct xe_gt *gt = guc_to_gt(guc); - struct xe_uc_fw *uc_fw = &guc->fw; - struct xe_uc_fw_version *version =&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];- u32 flags = 0; if (XE_WA(gt, 22012773006)) @@ -170,13 +165,8 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc) if (XE_WA(gt, 1509372804)) flags |= GUC_WA_RENDER_RST_RC6_EXIT; - if (XE_WA(gt, 14018913170)) { - if (GUC_VER(version->major, version->minor,version->patch) >= GUC_VER(70, 7, 0))- flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6; - else - drm_dbg(&xe->drm, "Skip WA 14018913170: GUCversion expected >= 70.7.0, found %u.%u.%u\n",- version->major, version->minor, version->patch); - } + if (XE_WA(gt, 14018913170)) + flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6; return flags; } diff --git a/drivers/gpu/drm/xe/xe_uc_fw.cb/drivers/gpu/drm/xe/xe_uc_fw.cindex 4714f2c8d2ba..e5bf59616f3d 100644 --- a/drivers/gpu/drm/xe/xe_uc_fw.c +++ b/drivers/gpu/drm/xe/xe_uc_fw.c @@ -296,36 +296,28 @@ static void uc_fw_fini(struct drm_device*drm, void *arg)xe_uc_fw_change_status(uc_fw, XE_UC_FIRMWARE_SELECTED); } -static void guc_read_css_info(struct xe_uc_fw *uc_fw, structuc_css_header *css)+static int guc_read_css_info(struct xe_uc_fw *uc_fw, structuc_css_header *css){ struct xe_gt *gt = uc_fw_to_gt(uc_fw); struct xe_uc_fw_version *release =&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];struct xe_uc_fw_version *compatibility =&uc_fw->versions.found[XE_UC_FW_VER_COMPATIBILITY];xe_gt_assert(gt, uc_fw->type == XE_UC_FW_TYPE_GUC); - xe_gt_assert(gt, release->major >= 70); - - if (release->major > 70 || release->minor >= 6) { - /* v70.6.0 adds CSS header support */ - compatibility->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR, - css->submission_version); - compatibility->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR, - css->submission_version); - compatibility->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH, - css->submission_version); - } else if (release->minor >= 3) { - /* v70.3.0 introduced v1.1.0 */ - compatibility->major = 1; - compatibility->minor = 1; - compatibility->patch = 0; - } else { - /* v70.0.0 introduced v1.0.0 */ - compatibility->major = 1; - compatibility->minor = 0; - compatibility->patch = 0; + + /* We don't support GuC releases older than 70.19 */ + if (release->major < 70 || (release->major == 70 &&release->minor < 19)) {+ xe_gt_err(gt, "Unsupported GuC v%u.%u! v70.19 ornewer is required\n",+ release->major, release->minor); + return -EINVAL; } + compatibility->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR,css->submission_version);+ compatibility->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR,css->submission_version);+ compatibility->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,css->submission_version);+ uc_fw->private_data_size = css->private_data_size; + + return 0; } int xe_uc_fw_check_version_requirements(struct xe_uc_fw *uc_fw) @@ -424,7 +416,7 @@ static int parse_css_header(structxe_uc_fw *uc_fw, const void *fw_data, size_trelease->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,css->sw_version);if (uc_fw->type == XE_UC_FW_TYPE_GUC) - guc_read_css_info(uc_fw, css); + return guc_read_css_info(uc_fw, css); return 0; }