Summary: | [Arrandale] Intermittent freezes - missed interrupt? | ||
---|---|---|---|
Product: | xorg | Reporter: | Christopher James Halse Rogers <chalserogers> |
Component: | Driver/intel | Assignee: | Chris Wilson <chris> |
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> |
Severity: | normal | ||
Priority: | medium | CC: | daniel, ilmari |
Version: | unspecified | Keywords: | NEEDINFO |
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Christopher James Halse Rogers
2010-06-28 23:35:38 UTC
I believe this is the relevant commit: commit e552eb7038a36d9b18860f525aa02875e313fe16 Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Wed Apr 21 11:39:23 2010 -0700 drm/i915: use PIPE_CONTROL instruction on Ironlake and Sandy Bridge Since 965, the hardware has supported the PIPE_CONTROL command, which provides fine grained GPU cache flushing control. On recent chipsets, this instruction is required for reliable interrupt and sequence number reporting in the driver. So add support for this instruction, including workarounds, on Ironlake and Sandy Bridge hardware. https://bugs.freedesktop.org/show_bug.cgi?id=27108 Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Anholt <eric@anholt.net> I have been experiencing this issue on 2.6.35-rc3 when a bunch of kernel debug options are enabled - let me know if you'd like the .config (tuned for an Intel desktop board). It reproduces reliably usually within an hour of activity and we get a similar signature [1] to other reports. Commit e552eb7038a36d9b18860f525aa02875e313fe16 doesn't seem to address the issue, as this commit was introduced in 2.6.34. --- [1] INFO: task i915:774 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. i915 D ffff88010f0ba610 6280 774 2 0x00000000 ffff88010f22dcb0 0000000000000086 ffff88010e742a00 ffff88010f0ba610 ffff88010f22dfd8 00000000001d4440 ffff88010f22dfd8 ffff88010f0ba610 00000000001d4440 00000000001d4440 ffff88010f22dfd8 00000000001d4440 Call Trace: [<ffffffff813a41ba>] ? intel_idle_update+0x4a/0x220 [<ffffffff816425fa>] mutex_lock_nested+0x1ea/0x4c0 [<ffffffff813a41ba>] ? intel_idle_update+0x4a/0x220 [<ffffffff813a41ba>] ? intel_idle_update+0x4a/0x220 [<ffffffff81644d03>] ? _raw_spin_unlock_irqrestore+0x53/0xa0 [<ffffffff813a4170>] ? intel_idle_update+0x0/0x220 [<ffffffff813a41ba>] intel_idle_update+0x4a/0x220 [<ffffffff813a4170>] ? intel_idle_update+0x0/0x220 [<ffffffff81069b20>] worker_thread+0x220/0x390 [<ffffffff81069ace>] ? worker_thread+0x1ce/0x390 [<ffffffff8108475d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8106eb70>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81069900>] ? worker_thread+0x0/0x390 [<ffffffff8106e65e>] kthread+0xae/0xc0 [<ffffffff81004024>] kernel_thread_helper+0x4/0x10 [<ffffffff81645114>] ? restore_args+0x0/0x30 [<ffffffff8106e5b0>] ? kthread+0x0/0xc0 [<ffffffff81004020>] ? kernel_thread_helper+0x0/0x10 INFO: lockdep is turned off. INFO: task Xorg:2004 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Xorg D ffffffff8180a360 4608 2004 1936 0x00400004 ffff8801112d5c58 0000000000000086 ffff8801112d5bd8 ffffffffffffffff ffff8801112d5fd8 00000000001d4440 ffff8801112d5fd8 ffff8801078f0000 00000000001d4440 00000000001d4440 ffff8801112d5fd8 00000000001d4440 Call Trace: [<ffffffff813958fb>] ? i915_gem_throttle_ioctl+0x3b/0x90 [<ffffffff816425fa>] mutex_lock_nested+0x1ea/0x4c0 [<ffffffff813958fb>] ? i915_gem_throttle_ioctl+0x3b/0x90 [<ffffffff814f27f5>] ? sock_aio_write+0x125/0x140 [<ffffffff813958fb>] ? i915_gem_throttle_ioctl+0x3b/0x90 [<ffffffff813958fb>] i915_gem_throttle_ioctl+0x3b/0x90 [<ffffffff8137734a>] drm_ioctl+0x33a/0x4c0 [<ffffffff812d3d9c>] ? debug_object_deactivate+0x5c/0x110 [<ffffffff812d333c>] ? do_raw_spin_unlock+0x6c/0xc0 [<ffffffff8108280d>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff81644d03>] ? _raw_spin_unlock_irqrestore+0x53/0xa0 [<ffffffff812d3deb>] ? debug_object_deactivate+0xab/0x110 [<ffffffff81072ffc>] ? lock_hrtimer_base+0x2c/0x60 [<ffffffff8112eff8>] vfs_ioctl+0x38/0xd0 [<ffffffff8112f1ca>] do_vfs_ioctl+0x8a/0x5b0 [<ffffffff81058dbc>] ? do_setitimer+0x1cc/0x1f0 [<ffffffff810e8d72>] ? might_fault+0x72/0xd0 [<ffffffff8112f73a>] sys_ioctl+0x4a/0x80 [<ffffffff812d3d9c>] ? debug_object_deactivate+0x5c/0x110 [<ffffffff81003182>] system_call_fastpath+0x16/0x1b Can you confirm that is waiting on a request and not stuck polling registers? i.e. cpu time is 0%. Can you also get the full stack traces of all processes so we can see which one is holding the mutex. The contents of /sys/kernel/debug/dri/0/ (in particular the interrupts and requests) will be useful as would an intel_reg_dumper. The 6 pipe-control flushes [actually repeated dword stores] are what the docs suggest as the work-around for fun hardware. Can you confirm if you are indeed still seeing this on 2.6.35? And if so give me the info I requested. Thanks. Still not sure what the cause actually is, but 2.6.36-rc3 contains a patch that will break any waits on a missed interrupt. Hello, I think I have exactly the same problem. The logfile is really similar. At a certain moment my X hangs, I can do a ssh shell and login and reboot my machine. I think this problem might be caused because the GPU gets to warm. I have had this problem 3 times now, yesterday I got it after only 10 minutes working with my system or so... So it is really become a serious problem for me! I'm using Ubuntu 10.04.1 LTS. Kernel: 2.6.32-24-generic. I have a D945GSEJT board (ATOM N270, 945G intel onboard graphics). This board does not have a FAN, it is cooled passively. I have had many troubles with my first board which at a certain moment did not work at all anymore. So I had to send it back and I got a replacement but I think this is a repaired board. I now have again a problem. This one. Can this problem be caused because the GPU gets to warm because it is not sufficiently cooled? One other thing: It this problem really fixed already? What do I need to do to get the fix? And is the cause clear? I mean is it fixed as a workaround or was it really a bug... Please let me know. I found this page, by googling on the error that I have... Thanks, Pieter Langendonck We are fairly confident that we have this fixed, so I am marking the report fixed. If the new hangcheck fires or if the machine is still freezing on 2.6.36, then please re-open. Pietr, you have a separate bug; google is mistaken. Unlikely to be overheating since the Atom has such a low TDP that passive cooling should be sufficient, if it was it should be throwing up more errors [MCE] than just hangs. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.