/** * DOC: Hardware workarounds * * Hardware workarounds are register programming documented to be executed in * the driver that fall outside of the normal programming sequences for a * platform. There are some basic categories of workarounds, depending on * how/when they are applied: * * - Context workarounds: workarounds that touch registers that are * saved/restored to/from the HW context image. The list is emitted (via Load * Register Immediate commands) once when initializing the device and saved in * the default context. That default context is then used on every context * creation to have a "primed golden context", i.e. a context image that * already contains the changes needed to all the registers. * * Context workarounds should be implemented in the \*_ctx_workarounds_init() * variants respective to the targeted platforms. * * - Engine workarounds: the list of these WAs is applied whenever the specific * engine is reset. It's also possible that a set of engine classes share a * common power domain and they are reset together. This happens on some * platforms with render and compute engines. In this case (at least) one of * them need to keeep the workaround programming: the approach taken in the * driver is to tie those workarounds to the first compute/render engine that * is registered. When executing with GuC submission, engine resets are * outside of kernel driver control, hence the list of registers involved in * written once, on engine initialization, and then passed to GuC, that * saves/restores their values before/after the reset takes place. See * ``drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c`` for reference. * * Workarounds for registers specific to RCS and CCS should be implemented in * rcs_engine_wa_init() and ccs_engine_wa_init(), respectively; those for * registers belonging to BCS, VCS or VECS should be implemented in * xcs_engine_wa_init(). Workarounds for registers not belonging to a specific * engine's MMIO range but that are part of of the common RCS/CCS reset domain * should be implemented in general_render_compute_wa_init(). The settings * about the CCS load balancing should be added in ccs_engine_wa_mode(). * * - GT workarounds: the list of these WAs is applied whenever these registers * revert to their default values: on GPU reset, suspend/resume [1]_, etc. * * GT workarounds should be implemented in the \*_gt_workarounds_init() * variants respective to the targeted platforms. * * - Register whitelist: some workarounds need to be implemented in userspace, * but need to touch privileged registers. The whitelist in the kernel * instructs the hardware to allow the access to happen. From the kernel side, * this is just a special case of a MMIO workaround (as we write the list of * these to/be-whitelisted registers to some special HW registers). * * Register whitelisting should be done in the \*_whitelist_build() variants * respective to the targeted platforms. * * - Workaround batchbuffers: buffers that get executed automatically by the * hardware on every HW context restore. These buffers are created and * programmed in the default context so the hardware always go through those * programming sequences when switching contexts. The support for workaround * batchbuffers is enabled these hardware mechanisms: * * #. INDIRECT_CTX: A batchbuffer and an offset are provided in the default * context, pointing the hardware to jump to that location when that offset * is reached in the context restore. Workaround batchbuffer in the driver * currently uses this mechanism for all platforms. * * #. BB_PER_CTX_PTR: A batchbuffer is provided in the default context, * pointing the hardware to a buffer to continue executing after the * engine registers are restored in a context restore sequence. This is * currently not used in the driver. * * - Other: There are WAs that, due to their nature, cannot be applied from a * central place. Those are peppered around the rest of the code, as needed. * Workarounds related to the display IP are the main example. * * .. [1] Technically, some registers are powercontext saved & restored, so they * survive a suspend/resume. In practice, writing them again is not too * costly and simplifies things, so it's the approach taken in the driver.
*/
if (IS_ALIGNED(wal->count, grow)) { /* Either uninitialized or full. */ struct i915_wa *list;
list = kmalloc_array(ALIGN(wal->count + 1, grow), sizeof(*list),
GFP_KERNEL); if (!list) {
drm_err(&i915->drm, "No space for workaround init!\n"); return;
}
if (wal->list) {
memcpy(list, wal->list, sizeof(*wa) * wal->count);
kfree(wal->list);
}
/* * WA operations on "masked register". A masked register has the upper 16 bits * documented as "masked" in b-spec. Its purpose is to allow writing to just a * portion of the register without a rmw: you simply write in the upper 16 bits * the mask of bits you are going to modify. * * The wa_masked_* family of functions already does the necessary operations to * calculate the mask based on the parameters passed, so user only has to * provide the lower 16 bits of that register.
*/
/* Use Force Non-Coherent whenever executing a 3D context. This is a * workaround for a possible hang in the unlikely event a TLB * invalidation occurs during a PSD flush.
*/ /* WaForceEnableNonCoherent:bdw,chv */ /* WaHdcDisableFetchWhenMasked:bdw,chv */
wa_masked_en(wal, HDC_CHICKEN0,
HDC_DONOT_FETCH_MEM_WHEN_MASKED |
HDC_FORCE_NON_COHERENT);
/* From the Haswell PRM, Command Reference: Registers, CACHE_MODE_0: * "The Hierarchical Z RAW Stall Optimization allows non-overlapping * polygons in the same 8x4 pixel/sample area to be processed without * stalling waiting for the earlier ones to write to Hierarchical Z * buffer." * * This optimization is off by default for BDW and CHV; turn it on.
*/
wa_masked_dis(wal, CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);
/* * BSpec recommends 8x4 when MSAA is used, * however in practice 16x4 seems fastest. * * Note that PS/WM thread counts depend on the WIZ hashing * disable bit, which we don't touch here, but it's good * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
*/
wa_masked_field_set(wal, GEN7_GT_MODE,
GEN6_WIZ_HASHING_MASK,
GEN6_WIZ_HASHING_16x4);
}
/* WaDisableDopClockGating:bdw * * Also see the related UCGTCL1 write in bdw_init_clock_gating() * to disable EUTC clock gating.
*/
wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN2,
DOP_CLOCK_GATING_DISABLE);
/* WaForceEnableNonCoherent and WaDisableHDCInvalidation are * both tied to WaForceContextSaveRestoreNonCoherent * in some hsds for skl. We keep the tie for all gen9. The * documentation is a bit hazy and so we want to get common behaviour, * even though there is no clear evidence we would need both on kbl/bxt. * This area has been source of system hangs so we play it safe * and mimic the skl regardless of what bspec says. * * Use Force Non-Coherent whenever executing a 3D context. This * is a workaround for a possible hang in the unlikely event * a TLB invalidation occurs during a PSD flush.
*/
/* * Supporting preemption with fine-granularity requires changes in the * batch buffer programming. Since we can't break old userspace, we * need to set our default preemption level to safe value. Userspace is * still able to use more fine-grained preemption levels, since in * WaEnablePreemptionGranularityControlByUMD we're whitelisting the * per-ctx register. As such, WaDisable{3D,GPGPU}MidCmdPreemption are * not real HW workarounds, but merely a way to start using preemption * while maintaining old contract with userspace.
*/
/* * Only consider slices where one, and only one, subslice has 7 * EUs
*/ if (!is_power_of_2(gt->info.sseu.subslice_7eu[i])) continue;
/* * subslice_7eu[i] != 0 (because of the check above) and * ss_max == 4 (maximum number of subslices possible per slice) * * -> 0 <= ss <= 3;
*/
ss = ffs(gt->info.sseu.subslice_7eu[i]) - 1;
vals[i] = 3 - ss;
}
/* WaForceEnableNonCoherent:icl * This is not the same workaround as in early Gen9 platforms, where * lacking this could cause system hangs, but coherency performance * overhead is high and only a few compute workloads really need it * (the register is whitelisted in hardware now, so UMDs can opt in * for coherency if they have a good reason).
*/
wa_mcr_masked_en(wal, ICL_HDC_MODE, HDC_FORCE_NON_COHERENT);
/* * Wa_16011163337 - GS_TIMER * * TDS_TIMER: Although some platforms refer to it as Wa_1604555607, we * need to program it even on those that don't explicitly list that * workaround. * * Note that the programming of GEN12_FF_MODE2 is further modified * according to the FF_MODE2 guidance given by Wa_1608008084. * Wa_1608008084 tells us the FF_MODE2 register will return the wrong * value when read from the CPU. * * The default value for this register is zero for all fields. * So instead of doing a RMW we should just write the desired values * for TDS and GS timers. Note that since the readback can't be trusted, * the clear mask is just set to ~0 to make sure other bits are not * inadvertently set. For the same reason read verification is ignored.
*/
wa_add(wal,
GEN12_FF_MODE2,
~0,
FF_MODE2_TDS_TIMER_128 | FF_MODE2_GS_TIMER_224,
0, false);
if (!IS_DG1(i915)) { /* Wa_1806527549 */
wa_masked_en(wal, HIZ_CHICKEN, HZ_DEPTH_TEST_LE_GE_OPT_DISABLE);
/* * This bit must be set to enable performance optimization for fast * clears.
*/
wa_mcr_write_or(wal, GEN8_WM_CHICKEN2, WAIT_ON_DEPTH_STALL_DONE_DISABLE);
}
/* * Due to Wa_16014892111, the DRAW_WATERMARK tuning must be done in * gen12_emit_indirect_ctx_rcs() rather than here on some early * steppings.
*/ if (!(IS_GFX_GT_IP_STEP(gt, IP_VER(12, 70), STEP_A0, STEP_B0) ||
IS_GFX_GT_IP_STEP(gt, IP_VER(12, 71), STEP_A0, STEP_B0)))
wa_add(wal, DRAW_WATERMARK, VERT_WM_VAL, 0x3FF, 0, false);
}
staticvoid fakewa_disable_nestedbb_mode(struct intel_engine_cs *engine, struct i915_wa_list *wal)
{ /* * This is a "fake" workaround defined by software to ensure we * maintain reliable, backward-compatible behavior for userspace with * regards to how nested MI_BATCH_BUFFER_START commands are handled. * * The per-context setting of MI_MODE[12] determines whether the bits * of a nested MI_BATCH_BUFFER_START instruction should be interpreted * in the traditional manner or whether they should instead use a new * tgl+ meaning that breaks backward compatibility, but allows nesting * into 3rd-level batchbuffers. When this new capability was first * added in TGL, it remained off by default unless a context * intentionally opted in to the new behavior. However Xe_HPG now * flips this on by default and requires that we explicitly opt out if * we don't want the new behavior. * * From a SW perspective, we want to maintain the backward-compatible * behavior for userspace, so we'll apply a fake workaround to set it * back to the legacy behavior on platforms where the hardware default * is to break compatibility. At the moment there is no Linux * userspace that utilizes third-level batchbuffers, so this will avoid * userspace from needing to make any changes. using the legacy * meaning is the correct thing to do. If/when we have userspace * consumers that want to utilize third-level batch nesting, we can * provide a context parameter to allow them to opt-in.
*/
wa_masked_dis(wal, RING_MI_MODE(engine->mmio_base), TGL_NESTED_BB_EN);
}
/* * Some blitter commands do not have a field for MOCS, those * commands will use MOCS index pointed by BLIT_CCTL. * BLIT_CCTL registers are needed to be programmed to un-cached.
*/ if (engine->class == COPY_ENGINE_CLASS) {
mocs = engine->gt->mocs.uc_index;
wa_write_clr_set(wal,
BLIT_CCTL(engine->mmio_base),
BLIT_CCTL_MASK,
BLIT_CCTL_MOCS(mocs, mocs));
}
}
/* * gen12_ctx_gt_fake_wa_init() aren't programmingan official workaround * defined by the hardware team, but it programming general context registers. * Adding those context register programming in context workaround * allow us to use the wa framework for proper application and validation.
*/ staticvoid
gen12_ctx_gt_fake_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
{ if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55))
fakewa_disable_nestedbb_mode(engine, wal);
/* Applies to all engines */ /* * Fake workarounds are not the actual workaround but * programming of context registers using workaround framework.
*/ if (GRAPHICS_VER(i915) >= 12)
gen12_ctx_gt_fake_wa_init(engine, wal);
/* * WaProgramMgsrForCorrectSliceSpecificMmioReads:gen9,glk,kbl,cml * Before any MMIO read into slice/subslice specific registers, MCR * packet control register needs to be programmed to point to any * enabled s/ss pair. Otherwise, incorrect values will be returned. * This means each subsequent MMIO read will be forwarded to an * specific s/ss combination, but this is OK since these registers * are consistent across s/ss in almost all cases. In the rare * occasions, such as INSTDONE, where this value is dependent * on s/ss combo, the read should be done with read_subslice_reg.
*/
slice = ffs(sseu->slice_mask) - 1;
GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask.hsw));
subslice = ffs(intel_sseu_get_hsw_subslices(sseu, slice));
GEM_BUG_ON(!subslice);
subslice--;
/* * We use GEN8_MCR..() macros to calculate the |mcr| value for * Gen9 to address WaProgramMgsrForCorrectSliceSpecificMmioReads
*/
mcr = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice);
mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK;
/* * Although a platform may have subslices, we need to always steer * reads to the lowest instance that isn't fused off. When Render * Power Gating is enabled, grabbing forcewake will only power up a * single subslice (the "minconfig") if there isn't a real workload * that needs to be run; this means that if we steer register reads to * one of the higher subslices, we run the risk of reading back 0's or * random garbage.
*/
subslice = __ffs(intel_sseu_get_hsw_subslices(sseu, 0));
/* * If the subslice we picked above also steers us to a valid L3 bank, * then we can just rely on the default steering and won't need to * worry about explicitly re-steering L3BANK reads later.
*/ if (gt->info.l3bank_mask & BIT(subslice))
gt->steering_table[L3BANK] = NULL;
/* * On Xe_HP the steering increases in complexity. There are now several * more units that require steering and we're not guaranteed to be able * to find a common setting for all of them. These are: * - GSLICE (fusable) * - DSS (sub-unit within gslice; fusable) * - L3 Bank (fusable) * - MSLICE (fusable) * - LNCF (sub-unit within mslice; always present if mslice is present) * * We'll do our default/implicit steering based on GSLICE (in the * sliceid field) and DSS (in the subsliceid field). If we can * find overlap between the valid MSLICE and/or LNCF values with * a suitable GSLICE, then we can just reuse the default value and * skip and explicit steering at runtime. * * We only need to look for overlap between GSLICE/MSLICE/LNCF to find * a valid sliceid value. DSS steering is the only type of steering * that utilizes the 'subsliceid' bits. * * Also note that, even though the steering domain is called "GSlice" * and it is encoded in the register using the gslice format, the spec * says that the combined (geometry | compute) fuse should be used to * select the steering.
*/
/* * Find the potential LNCF candidates. Either LNCF within a valid * mslice is fine.
*/
for_each_set_bit(i, >->info.mslice_mask, GEN12_MAX_MSLICES)
lncf_mask |= (0x3 << (i * 2));
/* * Are there any sliceid values that work for both GSLICE and LNCF * steering?
*/ if (slice_mask & lncf_mask) {
slice_mask &= lncf_mask;
gt->steering_table[LNCF] = NULL;
}
/* How about sliceid values that also work for MSLICE steering? */ if (slice_mask & gt->info.mslice_mask) {
slice_mask &= gt->info.mslice_mask;
gt->steering_table[MSLICE] = NULL;
}
/* * SQIDI ranges are special because they use different steering * registers than everything else we work with. On XeHP SDV and * DG2-G10, any value in the steering registers will work fine since * all instances are present, but DG2-G11 only has SQIDI instances at * ID's 2 and 3, so we need to steer to one of those. For simplicity * we'll just steer to a hardcoded "2" since that value will work * everywhere.
*/
__set_mcr_steering(wal, MCFG_MCR_SELECTOR, 0, 2);
__set_mcr_steering(wal, SF_MCR_SELECTOR, 0, 2);
/* * On DG2, GAM registers have a dedicated steering control register * and must always be programmed to a hardcoded groupid of "1."
*/ if (IS_DG2(gt->i915))
__set_mcr_steering(wal, GAM_MCR_SELECTOR, 1, 0);
}
/* * This is not a documented workaround, but rather an optimization * to reduce sampler power.
*/
wa_mcr_write_clr(wal, GEN10_DFR_RATIO_EN_AND_CHICKEN, DFR_DISABLE);
}
/* * Though there are per-engine instances of these registers, * they retain their value through engine resets and should * only be provided on the GT workaround list rather than * the engine-specific workaround list.
*/ staticvoid
wa_14011060649(struct intel_gt *gt, struct i915_wa_list *wal)
{ struct intel_engine_cs *engine; int id;
/* * Wa_14015795083 * * Firmware on some gen12 platforms locks the MISCCPCTL register, * preventing i915 from modifying it for this workaround. Skip the * readback verification for this workaround on debug builds; if the * workaround doesn't stick due to firmware behavior, it's not an error * that we want CI to flag.
*/
wa_add(wal, GEN7_MISCCPCTL, GEN12_DOP_CLOCK_GATE_RENDER_ENABLE,
0, 0, false);
}
/* * Wa_14018778641 * Wa_18018781329 * * Note that although these registers are MCR on the primary * GT, the media GT's versions are regular singleton registers.
*/
wa_write_or(wal, XELPMP_GSC_MOD_CTRL, FORCE_MISS_FTLB);
/* * Wa_14018575942 * * Issue is seen on media KPI test running on VDBOX engine * especially VP9 encoding WLs
*/
wa_write_or(wal, XELPMP_VDBX_MOD_CTRL, FORCE_MISS_FTLB);
/* * The bspec performance guide has recommended MMIO tuning settings. These * aren't truly "workarounds" but we want to program them through the * workaround infrastructure to make sure they're (re)applied at the proper * times. * * The programming in this function is for settings that persist through * engine resets and also are not part of any engine's register state context. * I.e., settings that only need to be re-applied in the event of a full GT * reset.
*/ staticvoid gt_tuning_settings(struct intel_gt *gt, struct i915_wa_list *wal)
{ if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 74))) {
wa_mcr_write_or(wal, XEHP_L3SCQREG7, BLEND_FILL_CACHING_OPT_DIS);
wa_mcr_write_or(wal, XEHP_SQCM, EN_32B_ACCESS);
}
for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
ok &= wa_verify(wal->gt, wa, wa->is_mcr ?
intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) :
intel_uncore_read_fw(uncore, wa->reg),
wal->name, from);
__maybe_unused staticbool is_nonpriv_flags_valid(u32 flags)
{ /* Check only valid flag bits are set */ if (flags & ~RING_FORCE_TO_NONPRIV_MASK_VALID) returnfalse;
/* NB: Only 3 out of 4 enum values are valid for access field */ if ((flags & RING_FORCE_TO_NONPRIV_ACCESS_MASK) ==
RING_FORCE_TO_NONPRIV_ACCESS_INVALID) returnfalse;
/* * WaAllowPMDepthAndInvocationCountAccessFromUMD:cfl,whl,cml,aml * * This covers 4 register which are next to one another : * - PS_INVOCATION_COUNT * - PS_INVOCATION_COUNT_UDW * - PS_DEPTH_COUNT * - PS_DEPTH_COUNT_UDW
*/
whitelist_reg_ext(w, PS_INVOCATION_COUNT,
RING_FORCE_TO_NONPRIV_ACCESS_RD |
RING_FORCE_TO_NONPRIV_RANGE_4);
}
/* * WaAllowPMDepthAndInvocationCountAccessFromUMD:icl * * This covers 4 register which are next to one another : * - PS_INVOCATION_COUNT * - PS_INVOCATION_COUNT_UDW * - PS_DEPTH_COUNT * - PS_DEPTH_COUNT_UDW
*/
whitelist_reg_ext(w, PS_INVOCATION_COUNT,
RING_FORCE_TO_NONPRIV_ACCESS_RD |
RING_FORCE_TO_NONPRIV_RANGE_4); break;
for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
intel_uncore_write(uncore,
RING_FORCE_TO_NONPRIV(base, i),
i915_mmio_reg_offset(wa->reg));
/* And clear the rest just in case of garbage */ for (; i < RING_MAX_NONPRIV_SLOTS; i++)
intel_uncore_write(uncore,
RING_FORCE_TO_NONPRIV(base, i),
i915_mmio_reg_offset(RING_NOPID(base)));
}
/* * engine_fake_wa_init(), a place holder to program the registers * which are not part of an official workaround defined by the * hardware team. * Adding programming of those register inside workaround will * allow utilizing wa framework to proper application and verification.
*/ staticvoid
engine_fake_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
{
u8 mocs_w, mocs_r;
/* * RING_CMD_CCTL specifies the default MOCS entry that will be used * by the command streamer when executing commands that don't have * a way to explicitly specify a MOCS setting. The default should * usually reference whichever MOCS entry corresponds to uncached * behavior, although use of a WB cached entry is recommended by the * spec in certain circumstances on specific platforms.
*/ if (GRAPHICS_VER(engine->i915) >= 12) {
mocs_r = engine->gt->mocs.uc_index;
mocs_w = engine->gt->mocs.uc_index;
if (HAS_L3_CCS_READ(engine->i915) &&
engine->class == COMPUTE_CLASS) {
mocs_r = engine->gt->mocs.wb_index;
/* * Even on the few platforms where MOCS 0 is a * legitimate table entry, it's never the correct * setting to use here; we can assume the MOCS init * just forgot to initialize wb_index.
*/
drm_WARN_ON(&engine->i915->drm, mocs_r == 0);
}
wa_masked_field_set(wal,
--> --------------------
--> maximum size reached
--> --------------------
Messung V0.5
¤ Dauer der Verarbeitung: 0.21 Sekunden
(vorverarbeitet)
¤
Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.
Bemerkung:
Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.