Created attachment 110404 [details] [review] hack/test for alternate drm_intel_gem_bo_references() semantics Setup: - HSW GT3e in desktop case - Ubuntu 14.10 64-bit (kernel 3.16, Xorg 1.16) - Latest libdrm & Mesa 32-bit builds (2014-11-07) - Witcher2 game from Steam (32-bit) Steps: - Start Witcher2 with latest Mesa - Select FullHD resolution and highest generic gfx option, then disable anti-aliasing & ubersampling from the advanced options - Select "Arena" option from the main menu - After animation stops, click through "discussion" and pan around with mouse Results: - When panning around, some orientations show 100% (single) CPU utilization. - "perf" reports (nearly) *half* of the CPU consumption to happen in (very small & recursive) libdrm "_drm_intel_gem_bo_references" function. Analysis: Only caller of "_drm_intel_gem_bo_references" is the exported "drm_intel_gem_bo_references" function. Tracing the calls to that, reveals it to be called from Mesa gen6_check_query() function. [1] Removing libdrm _drm_intel_gem_bo_references() CPU bottleneck by doing flushes unconditionally in gen6_check_query() removed most of the CPU consumption and verifies the "perf" finding. However, those extra flushes made performance marginally worse. Printing statistics from resolving counts showed that for Witcher2, largest relocation count in _drm_intel_gem_bo_references() was 590, but ~97% of the calls had zero relocation counts. Another test was changing the semantics of "drm_intel_gem_bo_references". This also removed most of the Witcher2 CPU consumption, potentially with speed improvement. *On the test machine*, Witcher2 isn't CPU bound despite ~100% CPU load, so CPU usage doesn't directly affect that. *However*, on a temperature limited machine (e.g. laptop with GT3), this could have clear performance impact as the lowered CPU consumption may allow GPU to run at higher clock speed. Power usage should at least be effected. Attached is patch/hack (by Fransisco Jerez) for testing this. Conclusion: There could be two separate functions, with slightly different semantics. One that is fast and does something similar to what Fransisco proposed and which can be used by (Mesa) functions that don't need more accurate information, and the current "libdrm _drm_intel_gem_bo_references" function for those that do need it. --- [1] In addition to resource usage tracing, functracer can attach to a running process and track calls to specified (exported) function: https://maemo.gitorious.org/maemo-tools/functracer According to it, the callers were: 194154 calls (for the trace period): 0xf601e722 drm_intel_bo_references() at intel_bufmgr.c:298 0xf63ca55c gen6_check_query() at gen6_queryobj.c:329 0xf6144e8d _mesa_GetQueryObjectiv() at queryobj.c:620 1133 calls: 0xf601e722 drm_intel_bo_references() at intel_bufmgr.c:298 0xf63ca34c gen6_queryobj_get_results() at gen6_queryobj.c:128 0xf63ca583 gen6_check_query() at gen6_queryobj.c:333 0xf6144e8d _mesa_GetQueryObjectiv() at queryobj.c:620 mp.h:17626 62 calls: 0xf601e722 drm_intel_bo_references() at intel_bufmgr.c:298 0xf62fe973 brw_map_buffer_range() at intel_buffer_objects.c:390 0xf60754b6 _mesa_MapBufferRange() at bufferobj.c:2178
Created attachment 110405 [details] [review] Only emit flush the batch once after a pending query
Changing product to one that we have visibility over.
Thanks, I tested the patch and it seems to work fine. Game CPU utilization dropped nearly to half from >100%. The load is now spread more evenly between cores, earlier one core was (almost) fully loaded. On my test machine, speed remains about the same, so no regressions. FPS seems to fluctuate a bit more (~3%), I'm not sure why, but I don't think that a problem.
Reassign to Ken because he has patches on mesa-dev for the bug. :)
Committed to master.
Verified. CPU utilization is lower, now CPU freq can also sometimes dip lower (on the core on which witcher is running). FPS is same, it just fluctuates a bit more due to CPU freq changes. CPU utilization is now mostly due to there just being a lot of system calls (vdso usage below is just for clock_gettime()), not because of libdrm relocations: 36,70% 76240 [kernel.kallsyms] 15,13% 25936 [vdso] 13,69% 32532 perf-20837.map 11,94% 20298 i965_dri.so 8,54% 15647 the 4,71% 8515 libpthread-2.19.so 4,30% 7280 libdrm_intel.so.1.0.0 1,82% 3456 libc-2.19.so 1,10% 1857 libglapi.so.0.0.0 0,77% 1304 libdrm.so.2.4.0
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.