Bug 86969

Summary: _drm_intel_gem_bo_references() function takes half the CPU with Witcher2 game
Product: Mesa Reporter: Eero Tamminen <eero.t.tamminen>
Component: Drivers/DRI/i965Assignee: Kenneth Graunke <kenneth>
Status: VERIFIED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: currojerez, idr
Version: gitKeywords: have-backtrace
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: hack/test for alternate drm_intel_gem_bo_references() semantics
Only emit flush the batch once after a pending query

Description Eero Tamminen 2014-12-03 11:30:00 UTC
Created attachment 110404 [details] [review]
hack/test for alternate drm_intel_gem_bo_references() semantics

Setup:

- HSW GT3e in desktop case
- Ubuntu 14.10 64-bit (kernel 3.16, Xorg 1.16)
- Latest libdrm & Mesa 32-bit builds (2014-11-07)
- Witcher2 game from Steam (32-bit)

Steps:

- Start Witcher2 with latest Mesa
- Select FullHD resolution and highest generic gfx option, then disable anti-aliasing & ubersampling from the advanced options
- Select "Arena" option from the main menu
- After animation stops, click through "discussion" and pan around with mouse

Results:

- When panning around, some orientations show 100% (single) CPU utilization.
- "perf" reports (nearly) *half* of the CPU consumption to happen in (very small & recursive) libdrm "_drm_intel_gem_bo_references" function.


Analysis:

Only caller of "_drm_intel_gem_bo_references" is the exported "drm_intel_gem_bo_references" function.  Tracing the calls to that, reveals it to be called from Mesa gen6_check_query() function. [1]

Removing libdrm _drm_intel_gem_bo_references() CPU bottleneck by doing flushes unconditionally in gen6_check_query() removed most of the CPU consumption and verifies the "perf" finding. However, those extra flushes made performance marginally worse.


Printing statistics from resolving counts showed that for Witcher2, largest relocation count in _drm_intel_gem_bo_references() was 590, but ~97% of the calls had zero relocation counts.

Another test was changing the semantics of "drm_intel_gem_bo_references".  This also removed most of the Witcher2 CPU consumption, potentially with speed improvement.  *On the test machine*, Witcher2 isn't CPU bound despite ~100% CPU load, so CPU usage doesn't directly affect that.  *However*, on a temperature limited machine (e.g. laptop with GT3), this could have clear performance impact as the lowered CPU consumption may allow GPU to run at higher clock speed. Power usage should at least be effected.

Attached is patch/hack (by Fransisco Jerez) for testing this.


Conclusion:

There could be two separate functions, with slightly different semantics.  One that is fast and does something similar to what Fransisco proposed and which can be used by (Mesa) functions that don't need more accurate information, and the current "libdrm _drm_intel_gem_bo_references" function for those that do need it.


---

[1] In addition to resource usage tracing, functracer can attach to a running process and track calls to specified (exported) function:
  https://maemo.gitorious.org/maemo-tools/functracer

According to it, the callers were:

194154 calls (for the trace period):
0xf601e722 drm_intel_bo_references() at intel_bufmgr.c:298
0xf63ca55c gen6_check_query() at gen6_queryobj.c:329
0xf6144e8d _mesa_GetQueryObjectiv() at queryobj.c:620

1133 calls:
0xf601e722 drm_intel_bo_references() at intel_bufmgr.c:298
0xf63ca34c gen6_queryobj_get_results() at gen6_queryobj.c:128
0xf63ca583 gen6_check_query() at gen6_queryobj.c:333
0xf6144e8d _mesa_GetQueryObjectiv() at queryobj.c:620
mp.h:17626

62 calls:
0xf601e722 drm_intel_bo_references() at intel_bufmgr.c:298
0xf62fe973 brw_map_buffer_range() at intel_buffer_objects.c:390
0xf60754b6 _mesa_MapBufferRange() at bufferobj.c:2178
Comment 1 Chris Wilson 2014-12-03 11:54:52 UTC
Created attachment 110405 [details] [review]
Only emit flush the batch once after a pending query
Comment 2 Chris Wilson 2014-12-03 12:13:58 UTC
Changing product to one that we have visibility over.
Comment 3 Eero Tamminen 2014-12-03 16:32:41 UTC
Thanks, I tested the patch and it seems to work fine.

Game CPU utilization dropped nearly to half from >100%.  The load is now spread more evenly between cores, earlier one core was (almost) fully loaded.  On my test machine, speed remains about the same, so no regressions. FPS seems to fluctuate a bit more (~3%), I'm not sure why, but I don't think that a problem.
Comment 4 Ian Romanick 2014-12-15 21:18:25 UTC
Reassign to Ken because he has patches on mesa-dev for the bug. :)
Comment 5 Kenneth Graunke 2014-12-17 01:03:21 UTC
Committed to master.
Comment 6 Eero Tamminen 2014-12-17 12:25:34 UTC
Verified.  CPU utilization is lower, now CPU freq can also sometimes dip lower (on the core on which witcher is running).  FPS is same, it just fluctuates a bit more due to CPU freq changes.

CPU utilization is now mostly due to there just being a lot of system calls (vdso usage below is just for clock_gettime()), not because of libdrm relocations:
  36,70%         76240  [kernel.kallsyms]
  15,13%         25936  [vdso]
  13,69%         32532  perf-20837.map
  11,94%         20298  i965_dri.so
   8,54%         15647  the
   4,71%          8515  libpthread-2.19.so
   4,30%          7280  libdrm_intel.so.1.0.0
   1,82%          3456  libc-2.19.so
   1,10%          1857  libglapi.so.0.0.0
   0,77%          1304  libdrm.so.2.4.0

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.