Bug 18081

Summary: [G45] Broken Sword 3 (through WINE) hangs X
Product: Mesa Reporter: Sven Arvidsson <sa>
Component: Drivers/DRI/i965Assignee: Eric Anholt <eric>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: medium CC: mmoneta
Version: gitKeywords: NEEDINFO
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: xorg config
xorg log
dmesg output after crash
backtrace from hang
dmesg from drm-intel-next
Xorg log from drm-intel-next
Backtrace from warzone2100 hang
backtrace from prey
hang on xorg start with drm-intel-next
Backtrace from hyperspace
gpu dump after hang

Description Sven Arvidsson 2008-10-15 15:23:00 UTC
Bug description:
I'm getting frequent hangs when I run 3D applications. One of the easiest ways to reproduce this is at a specific point in the game Broken Sword 3 (running in Wine). But I have also been getting similar hangs with different games and applications using OpenGL, such as the fullscreen feature in F-Spot.

The screen stops updating, but the whole system doesn't freeze as it's still reachable over the network.

I'm now using master of mesa, but it also happens with version 7.2.

System environment:
-- chipset: G45 / ICH10R
-- system architecture: 32-bit
-- xf86-video-intel: 6707371176147340fabc9ab6f1e3d6d5ac980662
-- xserver: 1.5.2
-- mesa: 4830809524b20e517e949151957512b14d7e679a
-- drm: 458e2d5bc5f949d00cfcc9a3f9ce89f0c9f5628c
-- kernel: 2.6.27 with this patch:
http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=commit;h=2052746fc8397130c120f0194a89938b0b62b6cb
-- Linux distribution: Debian unstable
-- Machine or mobo model: Asus P5Q-EM
-- Display connector: DVI
Comment 1 Sven Arvidsson 2008-10-15 15:23:37 UTC
Created attachment 19675 [details]
xorg config
Comment 2 Sven Arvidsson 2008-10-15 15:24:04 UTC
Created attachment 19676 [details]
xorg log
Comment 3 Sven Arvidsson 2008-10-15 15:24:49 UTC
Created attachment 19677 [details]
dmesg output after crash
Comment 4 Sven Arvidsson 2008-10-15 15:25:26 UTC
Created attachment 19678 [details]
backtrace from hang
Comment 5 Gordon Jin 2008-10-15 19:11:32 UTC
There're more new G45 patches added into drm-intel-next. So can you retest the latest drm-intel-next?
Comment 6 Sven Arvidsson 2008-10-16 13:34:43 UTC
With drm-intel-next I can't start X at all. i915 seems to be causing a kernel oops.

I'm attaching dmesg output and the Xorg log.
Comment 7 Sven Arvidsson 2008-10-16 13:35:38 UTC
Created attachment 19701 [details]
dmesg from drm-intel-next
Comment 8 Sven Arvidsson 2008-10-16 13:36:42 UTC
Created attachment 19702 [details]
Xorg log from drm-intel-next
Comment 9 Eric Anholt 2008-10-17 00:08:19 UTC
From your log:
[   54.153163] pci 0000:00:02.0: pg_start == 0x00001fff, intel_private.gtt_entries == 0x00002000
[   54.153172] pci 0000:00:02.0: trying to insert into local/stolen memory

This strongly suggests that you're not running drm-intel-next, in particular the following commit:

commit 2052746fc8397130c120f0194a89938b0b62b6cb
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Oct 14 11:28:58 2008 -0700

    agp: Fix stolen memory counting on G4X.
Comment 10 Eric Anholt 2008-10-17 00:45:39 UTC
Also, it looks like your original configuration had the G45 fixes and they were working.  You might try your applications with vblank_mode=0 in driconf instead of vblank_mode=2 which is the default -- we're sorting out a bunch of vblank-related issues currently.  Does killing the offending applications get you back to a working desktop? (should if it's vblank, unlikely if it isn't)  Can you get backtraces of the apps at the point that they've failed?
Comment 11 Sven Arvidsson 2008-10-17 13:37:37 UTC
Created attachment 19727 [details]
Backtrace from warzone2100 hang

(In reply to comment #9)
> This strongly suggests that you're not running drm-intel-next
My mistake, I forgot to switch branches. X does start now, but all I get is a black background and the standard x-shaped cursor.

(In reply to comment #10)
> Also, it looks like your original configuration had the G45 fixes and they were
> working.  You might try your applications with vblank_mode=0 in driconf instead
> of vblank_mode=2 which is the default 
vblank_mode=0 does not seem to do any difference.

I can get back to the desktop after killing one application, a native game called warzone2100. This is also the only one I could get a backtrace from. This might be a different bug as it only seems to hang if run in 1680x1050 instead of the default 640x480. Resolution has no impact on the other games.

I failed to get backtraces both from the wine games, and native ones such as nexuiz, for some reason I can't Ctrl-C into gdb with them.

Also, these errors does not happen when warzone2100 hangs:
[  104.104006] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 21958 emitted: 21961
[  107.104006] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 21958 emitted: 21961
[  110.104006] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 21958 emitted: 21961
[  113.108006] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 21958 emitted: 21961
[  116.108007] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 21958 emitted: 21961
Comment 12 Sven Arvidsson 2008-10-25 12:02:50 UTC
Another game where this happens is the newly released demo of "Prey". 

The screen turns pink, with some corruption, and I get music for a short while before X hangs.

dmesg:
[  320.440006] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 26294 emitted: 26297
[  323.444008] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 26294 emitted: 26297
[  326.440005] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 26294 emitted: 26297
[  329.444009] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 26294 emitted: 26297
[  332.440007] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 26294 emitted: 26297
[  335.496505] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 0 emitted: 26298
[  338.496005] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 0 emitted: 26298
[  341.500005] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 0 emitted: 26298
[  344.496005] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 0 emitted: 26298
[  347.499148] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 0 emitted: 26298

Xorg.log:
intel_bufmgr_fake.c:392: Error waiting for fence: Device or resource busy.

I will attach a backtrace from Prey, but it's from before the hang, when the music still plays.

I'm running current master of mesa, drm, xf86-video-intel and drm-intel-next.
Comment 13 Sven Arvidsson 2008-10-25 12:03:56 UTC
Created attachment 19862 [details]
backtrace from prey
Comment 14 Sven Arvidsson 2008-10-25 12:07:55 UTC
Created attachment 19863 [details]
hang on xorg start with drm-intel-next

(In reply to comment #12)
> I'm running current master of mesa, drm, xf86-video-intel and drm-intel-next.
> 

My mistake, I'm not running drm-intel-next but vanilla 2.6.27. drm-intel-next still hangs on start. I'm attaching a backtrace, but I guess it belongs in another bug report.
Comment 15 Sven Arvidsson 2008-11-01 14:50:30 UTC
Compiz, when run with the blur plugin and "alpha_blur = true" and "alpha_blur_match = normal" results in an immediate hang and seems to be fully reproducible. I hope this is a better test case.

I can make available my full compiz configuration if necessary.
Comment 16 Sven Arvidsson 2008-11-04 15:24:16 UTC
An update, I'm now using:

-- xf86-video-intel: 040d9bf9d8748d1ed8f977a6356d198def978b51
-- mesa: 4be624d693554ad3950afab90e331a6725cc5004
-- drm: 87e90c73620b88005fcca5fd40aaaad0b08932e1
-- kernel: drm-intel-next 78d3308fc296d1f960a0b35912ba4178f56a3d6f

I can now start X with drm-intel-next and it seems to work about as well as 2.6.27. Sadly, the crashes still happens with GEM.

Comment 17 Sven Arvidsson 2008-11-12 15:38:11 UTC
Created attachment 20270 [details]
Backtrace from hyperspace

A short update, now using:
-- xf86-video-intel: 667923559219429b0c5fec12a0164f7eba1f8f2d
-- mesa: e1fbb30211549f2ee79d8ff9764f833e5317bebe
-- drm: 930c0e7cf4f4776f7a69e7acc6fedeed7addb235
-- same versions of kernel and xserver

Warzone 2100 and Prey no longer hangs. Prey isn't playable due to corrupt graphics but that's a seperate problem.

The original problem with Broken Sword 3 remains, I have also found a few new problem-makers:

Episode 2 of On the Rain-Slick Precipice of Darkness hangs after a few minutes of playing, I can't get a backtrace here though.

The Hyperspace demo from Really Slick Screensavers (RSS) causes an immediate hang and causes X to exit. I'm attaching a backtrace which seems to be good.

End of Xorg log:

Fatal server error:
Failure to wait for IRQ: Device or resource busy

dmesg output:

[   56.681716] [drm] Initialized drm 1.1.0 20060810
[   56.686227] pci 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[   56.686231] pci 0000:00:02.0: setting latency timer to 64
[   56.686492] pci 0000:00:02.0: irq 219 for MSI/MSI-X
[   56.686558] [drm] Initialized i915 1.6.0 20080730 on minor 0
[   56.697034] mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
[  115.656006] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 811 emitted: 812
[  118.660008] [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 811 emitted: 813
[  195.484006] [drm:i915_gem_idle] *ERROR* hardware wedged
Comment 18 Sven Arvidsson 2008-11-13 13:02:19 UTC
The Mesa demo fbo_firecube hangs in the same manner as hypercube, it might be an easier test case.

At one point it even crashed the drm kernel module, but I haven't been able to reproduce that.
Comment 19 Eric Anholt 2008-12-08 14:35:35 UTC
Please choose one application and make this bug report about that one application, and open separate bugs for other apps.

For example, fbo_firecube is fixed, and I've successfully run OTRSPOD:E2 for a bit, but I can't close this bug because you've listed 6 different failing applications at various points in time.
Comment 20 Sven Arvidsson 2008-12-08 15:31:17 UTC
Point taken. Let's keep this bug about the original problem, Broken Sword 3 (running through wine) locking up X at a specific point.

The hang still happens with these versions:

-- xf86-video-intel: bea98cdfd93fc1181a06c51e57fcab227ff4827e
-- xserver: 1.5.2
-- mesa: a0d5c3cfe6582f8294154f6877319193458158a2
-- drm: c99566fb810c9d8cae5e9cd39d1772b55e2f514c
-- kernel: for-airlied 66647dc60d16fae9f6963fd98b6d9baa1a8dac69

BTW, now that this is only tracking one application, maybe it shouldn't be a release blocker?

Comment 21 Sven Arvidsson 2009-05-22 16:19:22 UTC
Created attachment 26133 [details]
gpu dump after hang

Not sure if it's helpful here, but I'm attaching a gpu dump after the hang.
Comment 22 Eric Anholt 2009-06-17 11:51:00 UTC
Thanks for the dump.  We've got something weird going on with dumps these days, where the HEAD pointer within the batchbuffer isn't being reported.  Hmm.
Comment 23 Eric Anholt 2009-06-30 18:14:54 UTC
There were some bugs in intel_gpu_dump that broke HEAD pointer reporting, and
master fixes that and adds some more interesting information.  Not sure if
it'll help reveal anything, but it may be useful.

Also, mesa master has a fix for some G45 hangs, which may be valuable to test
against.

commit c3499f6c66bf93d7752ea70a13bbbab3d2b2c288
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Jun 30 14:26:06 2009 -0700

    i965: Increase G4X default VS URB allocation to actually allow 32 threads.
Comment 24 Sven Arvidsson 2009-07-03 15:25:51 UTC
Unfortunately, there's a regression which causes the game to hang on loading. I have bisected the problem and filed bug 22609.
Comment 25 Sven Arvidsson 2009-07-19 07:35:11 UTC
(In reply to comment #24)
> Unfortunately, there's a regression which causes the game to hang on loading. I
> have bisected the problem and filed bug 22609.
> 

This was my fault, texture tiling on again.

Good news, with mesa master the hang is gone! :)
Comment 26 Adam Jackson 2009-08-24 12:31:00 UTC
Mass version move, cvs -> git

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.