I just got an Intel Clarkdale system, and installed Sabayon Linux-5.3-rc on it. This has the following components:
- 2.6.34-sabayon kernel ( Linux kakak 2.6.34-sabayon #1 SMP Mon May 31 16:00:15 UTC 2010 x86_64 Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz GenuineIntel GNU/Linux )
With just gnome running, with NO compositing manager, I am getting random hard lockups. CTRL-ALT-SysRq magic system requests don't work. I am unable to get a batch buffer.
I have posted at the phoronix forums: http://www.phoronix.com/forums/showthread.php?t=24205 and it was recommended that I file a bug report. Also see the phoronix review which mentions hard lockups: http://www.phoronix.com/scan.php?page=article&item=intel_clarkdale_gpu&num=1
I have set the severity as 'blocker' as this system is wildly unstable - I had 3 hard lockups in 4 hours yesterday, and my first one this morning happened 3 minutes after powering on.
Can you try the latest stable kernel, 2.6.34.x? Or 2.6.35-rc2...
I've tested with a couple of different combinations. Firstly, upgrading to 2.6.35-rc2-git2 made lockups about half as frequent ... about 1 every 4 hours now.
Next I activated ecomorph ( compiz port for enlightenment-0.17 ). There were rendering issues with mesa-7.8.1, so I built mesa from git ( plus upgraded to libdrm-2.4.20 ). This fixed the rendering issues, but I started getting more frequent lockups again.
Next I upgraded xf86-video-intel from 2.11.0 to git. This gives me a hard lockup when X starts ... so I've had to downgrade that.
In all, I'm in a slightly better position than before ... still getting regular lockups, but not *quite* as many.
Actually, scratch that. Just had another 2 in quick succession ( power cycling between lockups ). Back to FBDEV :(
Strange, I've run with this configuration for quite some time. I haven't seen hangs since the PIPE_CONTROL patchset landed.
You could also try the drm-intel-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel.git, it has a couple of fixes that haven't landed upstream yet, including one that can cause a panic if page flipping is in use.
Any update here? I'm still using this config pretty happily...
It still locks my system hard.
I've tested with anholt's drm-intel-next kernel ( 2.6.34 ) as per your recommendation above, and also kept up with the latest git sources of libdrm, mesa and xf86-video-intel. This kernel has some issues, including udev consuming 100% of one of my cores. That aside ( I can kill it and it doesn't break too much ), I get anywhere from a couple of seconds to a couple of hours of usage, and then the system locks hard again.
Since this is a work desktop, I can't afford to have it crashing at all, and in particular leaving filesystem corruption. So I'm using fbdev. But I'm not impressed.
Hm ok, if you get a chance (I know what you mean about work machines, I hate when mine fails too), can you collect some additional information about the crash? http://intellinuxgraphics.org/how_to_report_bug.html has a good overview of what we like to see in order to narrow things down.
I'm not able to get *anything* out of my system after one of these lockups, so obtaining the last batchbuffer is not possible. If you want other info on my system, I'll have to rebuild my kernel with debugfs support ( or something like this ). I can do it if you think this will be handy.
debugfs is mainly handy after the GPU hangs (assuming the machine is still alive). Since your machine hard locks, it might be better to set up netconsole so you can capture any kernel log output that occurs around the time of the hang.
Any news? Another option would be to try an Ubuntu or Fedora live disc and see if they're as unstable as your current config. There are a lot of aspects of the configuration that could go wrong, using a whole separate stack would help rule some of those out.
Oh and what platform is this specifically? A Dell, HP or other? Or a home made system with a specific board?
No positive news at this point. I updated to 2.6.35-rc4+ ... d44a78e83f7549b3c4ae611e667a0db265cf2e00 in anholt's kernel tree, and also c20a3628c7c6b7c41efe309b712bf93eb4e92039 in mesa on Friday, and this didn't help things at all. I also selected debugfs and netconsole ( as a module ), but netconsole didn't build and I don't have time to figure out why at this point.
As for trying other distros - yeah I suppose I can try that. I'm currently using Sabayon, with some packages ( mesa, libdrm, kernel, etc, built via portage or from git ).
This system is a Lenova 'M Series' ThinkCentre, bought by my employer. It is in it's stock-standard state - I'm not allowed to open the box at all.
Also, I'm starting to wonder if the issue is actually in xf86-video-intel. I get lockups whether I have any 3D clients running or not.
(In reply to comment #11)
> Oh and what platform is this specifically? A Dell, HP or other? Or a home
> made system with a specific board?
FYI: we're having the same issues with a new batch of Dell Optiplex 980 systems here at our site. All have the Clarkdale chip in them (according to the X-server).
We're running 64-bit RHEL 5.4, kernel 2.6.18-194.3.1.el5. X-server and Intel device driver RPMs:
If there is any specific debug data that might be helpful, I can see if I can generate that and submit it for review.
(In reply to comment #7)
> Hm ok, if you get a chance (I know what you mean about work machines, I hate
> when mine fails too), can you collect some additional information about the
> crash? http://intellinuxgraphics.org/how_to_report_bug.html has a good
> overview of what we like to see in order to narrow things down.
-- chipset: IGDNG_D
-- system architecture: x86_64
-- xf86-video-intel/xserver/mesa/libdrm version:
-- kernel version: 2.6.18-194.3.1.el5
-- Linux distribution: Redhat Enterprise Linux 5.4
-- Machine or mobo model: Dell Optiplex 980
-- Display connector: displayport with DP->DVI adapter
Lockups are such that I cannot recover without poweroff/poweron. No errors related to crash are logged to local or remote syslog host, nor are there any errors reported in /var/log/Xorg.0.log. I can submit them (or other debugging output) if needed.
I am not able to get a dump of the VBIOS; the instructions for getting this dump do not work:
notam(su): lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 02)
notam(su): echo 1 > /sys/devices/pci0000:00/0000:00:02.0/rom
/sys/devices/pci0000:00/0000:00:02.0/rom: Permission denied.
Unfortunately, there is no /sys/kernel/debug/dri directory present on our systems, either, making it difficult to get a "last batch buffer before GPU hang" dump.
if you are actually running RHEL (not CentOS), you should probably report the bug via Issue Tracker or calling Red Hat support. Its unlikely we'll backport fixes unless there are customers raising the issues via the appropriate channels.
(In reply to comment #12)
> Also, I'm starting to wonder if the issue is actually in xf86-video-intel. I
> get lockups whether I have any 3D clients running or not.
Daniel, it's likely to be a hard lockup inside the kernel. There were a *lot* of Ironlake fixes that we have pushed recently, can you test
which is 2.6.36-rc3 + regression fixes?
> Daniel, it's likely to be a hard lockup inside the kernel. There were a *lot*
> of Ironlake fixes that we have pushed recently
Just tested 2.6.36 and this still locks hard every 2 hours or so.
Assuming we've fixed this in 2.6.37 or current Linus, please re-open if not.
Just did a fresh install with a 2.6.37 kernel and tried the Intel driver again. Result: constant hard lockups :(
Re-opening and switching back to fbdev once more ...
We might just have to buy an affected machine to root cause this one, it worries me.
Gordon, is this something you can reproduce with your systems?
No, (In reply to comment #21)
> Gordon, is this something you can reproduce with your systems?
No, Clarkdale works quite solid on my side.
Daniel, can you reproduce this on your system with a different distro, e.g. a recent Ubuntu livecd or something? I suspect a configuration issue somehow...
Downgrading proirity, if wasn't fixed in the last 4 releases, it's not going to be fixed by tomorrow.
timeout, hopefully this isn't an issue anymore anyway (and please confirm with another distro before re-opening too, to rule out config)