17179 – [i965 classic] openarena (and torcs/doom3/ut2004/celestia) hangs X with error "intel_bufmgr_fake.c:943: dri_fake_emit_reloc: Assertion `target_buf' failed."

Bug 17179 - [i965 classic] openarena (and torcs/doom3/ut2004/celestia) hangs X with error "intel_bufmgr_fake.c:943: dri_fake_emit_reloc: Assertion `target_buf' failed."

Summary: [i965 classic] openarena (and torcs/doom3/ut2004/celestia) hangs X with error...

Status:	VERIFIED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	high critical
Assignee:	Eric Anholt
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	intel-mesa-blocker
	Show dependency tree / graph

Reported:	2008-08-17 20:42 UTC by liuhaien
Modified:	2008-10-12 18:54 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments
xorg.0.log (46.14 KB, text/plain) 2008-08-17 20:42 UTC, liuhaien	Details
xorg conf file (3.57 KB, text/plain) 2008-08-17 20:42 UTC, liuhaien	Details
patch to fix the assertion issue (3.83 KB, patch) 2008-09-17 23:51 UTC, haihao	Details \| Splinter Review
View All

Description liuhaien 2008-08-17 20:42:16 UTC

Created attachment 18341 [details]
xorg.0.log

System Environment:
--------------------------
Host:           965
Arch:           i386
Kernel:         2.6.27-rc2
Libdrm:master       a5381cac55e54a535acf752970886b659948563c
Mesa: master       b7ff70e16a5bd468c76b75c2b557897a827fae73
Xserver: master                  99583b43a9a202d047ff417d47485e4c0e0c9670
Xf86_video_intel:master    6eb3e0f2f4e43e436029fc82e458ac8de1f94745

Bug detailed description:
--------------------------
start X,then run 'openarena +exec stressbases3' with CLASSIC,it will hang X with error:
"intel_bufmgr_fake.c:943: dri_fake_emit_reloc: Assertion `target_buf' failed."

the version of openarena is 0.8.0,and stressbases3 is one of its demos.

backtrace(get from gdb and CTRL+C):
 
#0  0xffffe424 in __kernel_vsyscall ()
#1  0x00c9fc29 in ioctl () from /lib/libc.so.6
#2  0xb7de0748 in drmIoctl (fd=13, request=1074291754, arg=0xbf815be8)
    at xf86drm.c:183
#3  0xb7de141f in drmGetLock (fd=13, context=1, flags=0) at xf86drm.c:1272
#4  0xb7deda35 in DRILock (pScreen=0x8247760, flags=0) at dri.c:2183
#5  0xb7dedaa1 in DRIDoWakeupHandler (screenNum=0, wakeupData=0x0, result=1,
    pReadmask=0x81e1d80) at dri.c:1637
#6  0xb7decaa7 in DRIWakeupHandler (wakeupData=0x0, result=1,
    pReadmask=0x81e1d80) at dri.c:1609
#7  0x0808a779 in WakeupHandler (result=1, pReadmask=0x81e1d80)
    at dixutils.c:418
#8  0x081274db in WaitForSomething (pClientsReady=0x82f3fe8) at WaitFor.c:231
#9  0x08086abd in Dispatch () at dispatch.c:368
#10 0x0806c76a in main (argc=2, argv=0xbf816044, envp=Cannot access memory at address 0x40086432
) at main.c:391
Reproduce steps:
----------------
1.xinit&
2.export INTEL_NO_GEM=1
3../openarena +exec stressbases3

Comment 1 liuhaien 2008-08-17 20:42:45 UTC

Created attachment 18342 [details]
xorg conf file

Comment 2 liuhaien 2008-08-17 22:45:39 UTC

stressbases3 should be stress_bases3,and you run it by below steps:
1.run openarena
2.select demos on the menu
3.goes into demos you will find stress_bases3,then run it.

Comment 3 liuhaien 2008-09-02 01:51:57 UTC

this issue also exists on GM45.

Comment 4 lin, jiewen 2008-09-03 02:01:16 UTC

Now it still exists with current unstable build.

Comment 5 lin, jiewen 2008-09-03 02:16:26 UTC

This issue also exists when running 3D games like torcs,doom3-demo,ut2004,et,celestia.

Comment 6 Gordon Jin 2008-09-08 05:06:35 UTC

Increasing priority since it impacts many 3d games.
And note it can be reproduced with non-gem kernel.

Comment 7 Gordon Jin 2008-09-12 01:19:20 UTC

reassigning to Haihao.
the driver contains GEM, but the kernel doesn't. So it's in classic mode.

Comment 8 haihao 2008-09-17 23:51:46 UTC

Created attachment 18970 [details] [review]
patch to fix the assertion  issue

This assertion issue is introduced by a gem commit d2796939f18815935c8fe1effb01fa9765d6c7d8.  There are two problems:
 1. brw->curbe.curbe_bo is set to NULL in intel_batchbuffer_flush if check_aperture fails.
 2. input->bo is also set to NULL after uploading, so that brw_emit_vertices will   derefence a NULL pointer when re-emiting all states.

Comment 9 haihao 2008-09-17 23:58:05 UTC

Hi, Eric
   Could you review this patch? Note this patch only resolves the assertion issue.
X still get hung when running celestia etc. If I reverted the the move of the bufmgr to libdrm, the hang issue would disappear. 

Thanks.
haihao

Comment 10 haihao 2008-09-18 00:55:23 UTC

Hi, Eric
   It seems the hang issue is caused by the following code in intel_bufmgr_fake.c

   /* The kernel implementation of IRQ_WAIT is broken for wraparound, and has                                                
    * been since it was first introduced.  It only checks for                                                                
    * completed_seq >= seq, and thus never returns for pre-wrapped irq values                                                
    * if the GPU wins the race.                                                                                              
    *                                                                                                                        
    * So, check if it looks like a pre-wrapped value and just return success.                                                
    */                                                                                                                       
   if (*bufmgr_fake->last_dispatch - cookie > 0x4000000)                                                                     
      return;                                                                                                                
 
I don't understand why you add this check.  F                                                                                                            or example: *bufmgr_fake->last_dispatch is 1000, and cookie is 1005, note *bufmgr_fake->last_dispatch and cookie all are unsigned int.  So *bufmgr->last_dispatch will never get updated.  

Thanks
haihao

Comment 11 lin, jiewen 2008-09-19 00:44:51 UTC

(In reply to comment #10)
> Hi, Eric
>    It seems the hang issue is caused by the following code in
> intel_bufmgr_fake.c
> 
>    /* The kernel implementation of IRQ_WAIT is broken for wraparound, and has   
>     * been since it was first introduced.  It only checks for                   
>     * completed_seq >= seq, and thus never returns for pre-wrapped irq values   
>     * if the GPU wins the race.                                                 
>     *                                                                           
>     * So, check if it looks like a pre-wrapped value and just return success.   
>     */                                                                          
>    if (*bufmgr_fake->last_dispatch - cookie > 0x4000000)                        
>       return;                                                                   
> 
> I don't understand why you add this check.  F                                  
>                                                                          or
> example: *bufmgr_fake->last_dispatch is 1000, and cookie is 1005, note
> *bufmgr_fake->last_dispatch and cookie all are unsigned int.  So
> *bufmgr->last_dispatch will never get updated.  
> 
> Thanks
> haihao
> 
This issue gone by commenting out "
 if (*bufmgr_fake->last_dispatch - cookie > 0x4000000)                        
     return;   ", and so does bug 17623.

Comment 12 Eric Anholt 2008-09-23 15:18:56 UTC

Thanks for catching the comparison bug, and I've got a fix I'm testing for that, but this bug is different.  OpenArena is triggering the aperture space overflow case in the middle of batchbuffer state emit, which can't be allowed to happen (as you'd start a new batchbuffer without some of the relocations you're depending on for your state).

Comment 13 Eric Anholt 2008-09-23 16:19:39 UTC

commit d533da2db873942b3f8676a754b8be3c9718bedf
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Sep 23 15:53:29 2008 -0700

    i965: Cope with batch getting flushed in the middle of batchbuffer emits.
    
    This isn't required for GEM (at least, yet), but the check_aperture code
    for non-GEM results in batch getting flushed during emit.  brw_state_upload
    restarts state emits, but a bunch of the state emit functions were assuming
    that they would be called exactly once, after prepare and before new_batch.
    
    Bug #17179.

Comment 14 liuhaien 2008-10-12 18:54:49 UTC

verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.