Bug 17004

Summary:	[945 gem-classic] gltestperf run abort
Product:	Mesa	Reporter:	liuhaien <haien.liu>
Component:	Drivers/DRI/i965	Assignee:	Eric Anholt <eric>
Status:	VERIFIED WONTFIX	QA Contact:
Severity:	normal
Priority:	medium	CC:	shuang.he
Version:	unspecified
Hardware:	x86 (IA32)
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:
Attachments:	xorg.0.log xorg conf file Xorg.0.log_new dmesg info

Description liuhaien 2008-08-06 01:49:20 UTC

Created attachment 18150 [details]
xorg.0.log

System Environment:
--------------------------
Host:           945
Arch:           i386
Kernel:         2.6.26-rc5
Libdrm_gem:drm-gem branch       ceb3d5e3834452f9d54f974b8066f90168467443
Mesa_gem: drm-gem branch        ded9414024ef7b2fb1d991d872c56c0d85e9ce1f
Xserver: master                  26d31ad1c7f4c550d73419ecf76912d844186b30
Xf86_video_intel_gem:drm-gem branch    c2f0df4dc97c87539b66525a277c7d1e2c421f61

Bug detailed description:
--------------------------
start X,then run 'gltestperf' with CLASSIC,it will run abort with error:
intelWaitIrq: drm_i915_irq_wait: -16
below is backtrace:
backtrace:
#0  0x00c00556 in exit () from /lib/libc.so.6
#1  0xb7a14c2e in intelWaitIrq (intel=0x93a4368, seq=454749)
    at intel_ioctl.c:104
#2  0xb7a0ecb4 in intel_fence_wait (private=0x93a4368, cookie=454749)
    at intel_context.c:457
#3  0xb7c62ff9 in _fence_wait_internal (bufmgr_fake=0x93d2268, cookie=0)
    at intel_bufmgr_fake.c:222
#4  0xb7c6394a in dri_fake_reloc_and_validate_buffer (bo=0x968d7a0)
    at intel_bufmgr_fake.c:885
#5  0xb7c63c0c in dri_fake_process_relocs (batch_buf=0x968d7a0)
    at intel_bufmgr_fake.c:1069
#6  0xb7c620b2 in dri_process_relocs (batch_buf=0x968d7a0) at dri_bufmgr.c:123
#7  0xb7a09783 in _intel_batchbuffer_flush (batch=0x93cb4b8,
    file=0xb7bd486c "intel_context.c", line=380) at intel_batchbuffer.c:166
#8  0xb7a0fabe in intelFlush (ctx=0x93a4368) at intel_context.c:380
#9  0xb7a0b68a in intelClearWithBlit (ctx=0x93a4368, mask=1)
    at intel_blit.c:413
#10 0xb7a0c9ae in intelClear (ctx=0x93a4368, mask=257) at intel_buffers.c:580
#11 0xb7b8d959 in _mesa_Clear (mask=0) at main/buffers.c:184
#12 0x0804947f in init_test03 () at gltestperf.c:110
#13 0x0804a24a in display () at gltestperf.c:499
#14 0xb7efc3c7 in processWindowWorkList (window=0x939b470) at glut_event.c:1306
#15 0xb7efd028 in glutMainLoop () at glut_event.c:1353
---Type <return> to continue, or q <return> to quit---
#16 0x08049038 in main (ac=1, av=0xd29560) at gltestperf.c:578

Reproduce steps:
----------------
1.xinit&
2../gltestperf

Comment 1 liuhaien 2008-08-06 01:50:04 UTC

it also happens on q965.

Comment 2 liuhaien 2008-08-06 02:06:04 UTC

Created attachment 18151 [details]
xorg conf file

Comment 3 liuhaien 2008-08-06 02:14:54 UTC

before running this demo ,you disable GEM by running "export INTEL_NO_GEM=1".

Comment 4 liuhaien 2008-08-06 23:33:30 UTC

It is a new behavior with GEM,which uses drm-gem for all components except drm module and kernel from anholt's linux-2.6 tree drm-gem-merge branch.

Comment 5 liuhaien 2008-08-14 21:51:43 UTC

the issue still exists with the latest master tip.

Comment 6 liuhaien 2008-08-31 19:29:32 UTC

the issue still exists with the latest GEM.

Comment 7 Eric Anholt 2008-09-16 14:37:59 UTC

Not reaching this point with classic on my 915gm -- something else is going wrong, pegging the cpu and leaving junk on the screen when the app is killed.

Comment 8 lin, jiewen 2008-09-18 01:27:15 UTC

With mesa_7_2 branch, it work well.

Comment 9 lin, jiewen 2008-09-18 01:43:33 UTC

But on 965 ,this bug also exists with mesa_7_2 branch.

Comment 10 Eric Anholt 2008-09-23 17:11:45 UTC

libdrm:

commit 2db8e0c8ef8c7a66460fceda129533b364f6418c
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Sep 23 17:06:01 2008 -0700

    intel: Allow up to 15 seconds chewing on one buffer before acknowledging -EB
    
    The gltestperf demo in some cases took over seven seconds to make it through
    one batchbuffer on a GM965.

Comment 11 liuhaien 2008-09-23 20:39:02 UTC

with you fix,gltestperf still run abort,but without error info: "intelWaitIrq: drm_i915_irq_wait: -16"

Comment 12 Gordon Jin 2008-09-25 04:32:47 UTC

So I'm removing these words from summary: with error "intelWaitIrq: drm_i915_irq_wait: -16"

Comment 13 Eric Anholt 2008-09-25 14:09:27 UTC

Ok, so it aborts without the old message.  What's the new message?

Comment 14 lin, jiewen 2008-09-25 22:31:07 UTC

Created attachment 19225 [details]
Xorg.0.log_new

Comment 15 lin, jiewen 2008-09-25 22:34:45 UTC

Created attachment 19226 [details]
dmesg info

Comment 16 lin, jiewen 2008-09-25 22:40:06 UTC

New message:
log :  Xorg.0.log_new 
  
dmesg:  dmesg info 

aborted message in gdb:
[Switching to Thread -1211320624 (LWP 7544)]
0xffffe424 in __kernel_vsyscall ()

back trace message:
#0  0xffffe424 in __kernel_vsyscall ()
#1  0x4fa84fa0 in raise () from /lib/libc.so.6
#2  0x4fa868b1 in abort () from /lib/libc.so.6
#3  0xb7a7c079 in _fence_wait_internal (bufmgr_fake=0x805a900, seq=1539997)
    at intel_bufmgr_fake.c:389
#4  0xb7a7c8f6 in dri_fake_reloc_and_validate_buffer (bo=0x832ad18)
    at intel_bufmgr_fake.c:1046
#5  0xb7a7cbdc in dri_fake_bo_exec (bo=0x832ad18, used=16,
    cliprects=0x8202750, num_cliprects=0, DR4=0) at intel_bufmgr_fake.c:1272
#6  0xb7a7af8e in dri_bo_exec (bo=0x832ad18, used=16, cliprects=0x8202750,
    num_cliprects=0, DR4=0) at intel_bufmgr.c:135
#7  0xb7aab71d in _intel_batchbuffer_flush (batch=0x807b4a8,
    file=0xb7c5ea85 "intel_context.c", line=384) at intel_batchbuffer.c:152
#8  0xb7ac4467 in intelFlush (ctx=0x805ac08) at intel_context.c:384
#9  0xb7ab655a in intelClearWithBlit (ctx=0x805ac08, mask=257)
    at intel_blit.c:417
#10 0xb7ab322e in intelClear (ctx=0x805ac08, mask=257) at intel_buffers.c:580
#11 0xb7c0d8c9 in _mesa_Clear (mask=0) at main/buffers.c:184
#12 0x0804945f in init_test03 () at gltestperf.c:110
#13 0x0804a22a in display () at gltestperf.c:499
#14 0xb7f71397 in processWindowWorkList (window=0x8051b00) at glut_event.c:1306
#15 0xb7f71ff8 in glutMainLoop () at glut_event.c:1353
#16 0x08049018 in main (ac=1, av=Cannot access memory at address 0x1d7c
) at gltestperf.c:578

Comment 17 Eric Anholt 2008-10-01 14:34:58 UTC

I meant the output of the application.  The backtrace shows you in abort just after
      drmMsg("%s:%d: Error waiting for fence: %s.\n", __FILE__, __LINE__,
	     strerror(-ret));

Did the application really output nothing despite executing that code?

If the error is about -EBUSY, try bumping max busy_count from 5 to something huge, and see how long it actually takes (assuming that gltestperf isn't actually hanging).

Comment 18 lin, jiewen 2008-10-05 19:31:57 UTC

The only output of the application when aborted is "Aborted".  and with the lastest gem-classic branch it is still so.

Comment 19 Gordon Jin 2008-10-06 06:08:48 UTC

Eric, so gltestperf is working fine on your side? Which platforms have you tested?

Comment 20 Eric Anholt 2008-10-14 18:59:47 UTC

Was a problem in the previous releases as well, and this release will be much better at it.  Real applications seem to no longer encounter the problem that this app was hitting.  It's also not a problem with GEM.  Not worth fixing.

Comment 21 liuhaien 2008-10-14 22:49:55 UTC

(In reply to comment #20)
> Was a problem in the previous releases as well, and this release will be much
> better at it.  Real applications seem to no longer encounter the problem that
> this app was hitting.  It's also not a problem with GEM.  Not worth fixing.
> 
we test it with drm-intel-next kernel ,and it works fine.

Comment 22 Gordon Jin 2008-10-14 22:53:53 UTC

agree to close it, since it works with GEM kernel.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.