27614 – [i945gm] X hangs with PGTBL_ER: 0x102 on kernel 2.6.34-rc3

Bug 27614 - [i945gm] X hangs with PGTBL_ER: 0x102 on kernel 2.6.34-rc3

Summary: [i945gm] X hangs with PGTBL_ER: 0x102 on kernel 2.6.34-rc3

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	DRI git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Jesse Barnes
QA Contact:

URL:
Whiteboard:
Keywords:	NEEDINFO

Depends on:
Blocks:

Reported:	2010-04-13 02:13 UTC by Geir Ove Myhr
Modified:	2017-07-24 23:08 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
*Tarball with i915_, dmesg, Xorg.0.log, etc. from 2.6.34-rc3 with drm.debug=0x02** (395.43 KB, application/x-compressed-tar) 2010-04-13 02:32 UTC, Geir Ove Myhr	no flags	Details
View All

Description Geir Ove Myhr 2010-04-13 02:13:06 UTC

Forwarding an Ubuntu bug report from Stefano Rivera:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/560376

[Problem]
Xorg hangs with kernel 2.6.34-rc3 (and also with standard Ubuntu kernel which has drm from 2.6.33.1). Userspace is not the newest version due to pre-release freeze, but I suppose userspace shouldn't be able to cause a page table error anyway. 

xserver-xorg 1:7.5+5ubuntu1
libgl1-mesa-glx 7.7-4ubuntu1
libdrm2 2.4.18-1ubuntu2
xserver-xorg-video-intel 2:2.9.1-3ubuntu1


[Original report]
Since the resolution of bug #532100, X has started randomly hanging again. Around once a day in my usage.

Can't do a gdb backtrace as X is locked in a system call (see dmesg)

Observed with and without i915.powersave=0 (provided data doesn't have powersaving disabled)

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: xserver-xorg-video-intel 2:2.9.1-3ubuntu1
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic x86_64
Architecture: amd64
Date: Sun Apr 11 01:02:37 2010
DkmsStatus: Error: [Errno 2] No such file or directory
MachineType: Apple Inc. MacBook2,1
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-19-generic root=UUID=3345fa7f-d2c4-456f-8d0d-8fdb515433f7 ro quiet splash
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_ZA.UTF-8
 SHELL=/bin/bash
SourcePackage: xserver-xorg-video-intel
dmi.bios.date: 06/27/07
dmi.bios.vendor: Apple Inc.
dmi.bios.version: MB21.88Z.00A5.B07.0706270922
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: Mac-F4208CAA
dmi.board.vendor: Apple Inc.
dmi.board.version: PVT
dmi.chassis.asset.tag: Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: Apple Inc.
dmi.chassis.version: Mac-F4208CAA
dmi.modalias: dmi:bvnAppleInc.:bvrMB21.88Z.00A5.B07.0706270922:bd06/27/07:svnAppleInc.:pnMacBook2,1:pvr1.0:rvnAppleInc.:rnMac-F4208CAA:rvrPVT:cvnAppleInc.:ct10:cvrMac-F4208CAA:
dmi.product.name: MacBook2,1
dmi.product.version: 1.0
dmi.sys.vendor: Apple Inc.
system:
 distro: Ubuntu
 codename: lucid
 architecture: x86_64
 kernel: 2.6.32-19-generic

Comment 1 Geir Ove Myhr 2010-04-13 02:32:25 UTC

Created attachment 34962 [details]
Tarball with i915_*, dmesg, Xorg.0.log, etc. from 2.6.34-rc3 with drm.debug=0x02

This one is taken with i915.powersave=0, since on 2.6.34-rc3 the computer would hang quickly otherwise (possibly because this patch [1] is not included?). The captured hang occurred after about 5 hours of uptime.

From i915_error_state:

Time: 1271081444 s 749763 us
PCI ID: 0x27a2
EIR: 0x00000010
  PGTBL_ER: 0x00000102
  INSTPM: 0x00000000
  IPEIR: 0x00000000
  IPEHR: 0x00000000
  INSTDONE: 0x7fffffc0
  ACTHD: 0x00000000
seqno: 0x00000000

From i915_gem_seqno (different seqno from above):

Current sequence: 1187656
Waiter sequence:  1187656
IRQ sequence:     1187622

From intel_gpu_dump output (incompatible with i915_error_state, and I though GPU reset only happened on i965 and newer. Is the i915_error_state from an error that went unnoticed?):

ACTHD: 0x1b209ab8
EIR: 0x00000000
EMR: 0xffffffed
ESR: 0x00000001
PGTBL_ER: 0x00000000
IPEHR: 0x01000000
IPEIR: 0x00000000
INSTDONE: 0x7fffffc0

AFAICS, there is no batchbuffer captured in i915_error_state, even though ACTHD: 0x1b209ab8, only the ringbuffer

dmesg output has two blocked tasks (i915:759 and Xorg:1218):

[16976.436317] [drm:i915_add_request], 1187655
[16976.436747] [drm:i915_add_request], 1187656
[16976.932622] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
[16977.422639] [drm:intel_crtc_idle_timer], idle timer fired, downclocking
[17160.652637] INFO: task i915:759 blocked for more than 120 seconds.
[17160.652645] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17160.652651] i915          D ffff880001f15740     0   759      2 0x00000000
[17160.652661]  ffff880099aadd40 0000000000000046 0000000000000000 ffff880099aadfd8
[17160.652671]  ffff88009934dc40 0000000000015740 0000000000015740 ffff880099aadfd8
[17160.652679]  0000000000015740 ffff880099aadfd8 0000000000015740 ffff88009934dc40
[17160.652688] Call Trace:
[17160.652728]  [<ffffffffa02ee030>] ? i915_gem_retire_work_handler+0x0/0xa0 [i915]
[17160.652740]  [<ffffffff8153d98b>] __mutex_lock_slowpath+0xeb/0x180
[17160.652750]  [<ffffffff8100985b>] ? __switch_to+0xbb/0x2e0
[17160.652759]  [<ffffffff8105437e>] ? put_prev_entity+0x2e/0x70
[17160.652782]  [<ffffffffa02ee030>] ? i915_gem_retire_work_handler+0x0/0xa0 [i915]
[17160.652790]  [<ffffffff8153d5ab>] mutex_lock+0x2b/0x50
[17160.652813]  [<ffffffffa02ee06d>] i915_gem_retire_work_handler+0x3d/0xa0 [i915]
[17160.652821]  [<ffffffff81079fbc>] run_workqueue+0xbc/0x190
[17160.652829]  [<ffffffff8107a50b>] worker_thread+0x9b/0x100
[17160.652837]  [<ffffffff8107ec70>] ? autoremove_wake_function+0x0/0x40
[17160.652844]  [<ffffffff8107a470>] ? worker_thread+0x0/0x100
[17160.652851]  [<ffffffff8107e896>] kthread+0x96/0xa0
[17160.652858]  [<ffffffff8100be64>] kernel_thread_helper+0x4/0x10
[17160.652865]  [<ffffffff8107e800>] ? kthread+0x0/0xa0
[17160.652872]  [<ffffffff8100be60>] ? kernel_thread_helper+0x0/0x10
[17160.652893] INFO: task Xorg:1218 blocked for more than 120 seconds.
[17160.652897] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17160.652902] Xorg          D ffff880001f15740     0  1218   1156 0x00400004
[17160.652910]  ffff880099ddbcc8 0000000000000086 ffff880099ddbc78 ffff880099ddbfd8
[17160.652919]  ffff880037fb2e20 0000000000015740 0000000000015740 ffff880099ddbfd8
[17160.652928]  0000000000015740 ffff880099ddbfd8 0000000000015740 ffff880037fb2e20
[17160.652936] Call Trace:
[17160.652944]  [<ffffffff8153d98b>] __mutex_lock_slowpath+0xeb/0x180
[17160.652952]  [<ffffffff8153d5ab>] mutex_lock+0x2b/0x50
[17160.652975]  [<ffffffffa02edf8f>] i915_gem_ring_throttle+0x3f/0x80 [i915]
[17160.652998]  [<ffffffffa02edfe1>] i915_gem_throttle_ioctl+0x11/0x20 [i915]
[17160.653021]  [<ffffffffa023cf23>] drm_ioctl+0x283/0x460 [drm]
[17160.653030]  [<ffffffff812a258f>] ? rb_insert_color+0xdf/0x110
[17160.653054]  [<ffffffffa02edfd0>] ? i915_gem_throttle_ioctl+0x0/0x20 [i915]
[17160.653063]  [<ffffffff81033cf9>] ? default_spin_lock_flags+0x9/0x10
[17160.653071]  [<ffffffff8153ec34>] ? _raw_spin_lock_irqsave+0x34/0x50
[17160.653079]  [<ffffffff810822d5>] ? __remove_hrtimer+0x45/0xb0
[17160.653088]  [<ffffffff8115035a>] vfs_ioctl+0x3a/0xc0
[17160.653095]  [<ffffffff8115094d>] do_vfs_ioctl+0x6d/0x1f0
[17160.653103]  [<ffffffff8106481d>] ? sys_setitimer+0xbd/0xf0
[17160.653110]  [<ffffffff81150b57>] sys_ioctl+0x87/0xa0
[17160.653118]  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

[1]: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commit;h=0d2907f4bead56cff60f91068b3a3efa7149e702

Comment 2 Jesse Barnes 2010-06-01 12:27:51 UTC

Does this still happen with 2.6.34, latest libdrm and Mesa 7.8?

Comment 3 Chris Wilson 2010-06-06 06:44:23 UTC

The i915_error_state looks decoupled from the actual bug. These residual errors should be fixed with:

commit ac0c6b5ad3b3b513e1057806d4b7627fcc0ecc27
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 27 13:18:18 2010 +0100

    drm/i915: Rebind bo if currently bound with incorrect alignment.
    
    Whilst pinning the buffer, check that that its current alignment
    matches the requested alignment. If it does not, rebind.
    
    This should clear up any final render errors whilst resuming,
    for reference:
    
      Bug 27070 - [i915] Page table errors with empty ringbuffer
      https://bugs.freedesktop.org/show_bug.cgi?id=27070
    
      Bug 15502 -  render error detected, EIR: 0x00000010
      https://bugzilla.kernel.org/show_bug.cgi?id=15502
    
      Bug 13844 -  i915 error: "render error detected"
      https://bugzilla.kernel.org/show_bug.cgi?id=13844
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: stable@kernel.org
    Signed-off-by: Eric Anholt <eric@anholt.net>

However, the hang looks unrelated and more reminiscent of a page-flipping bug. Please open a new bug report if you can capture some information on it, thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.