Created attachment 23922 [details] my Xorg.0.log I recently encountered a case where Xorg spinned at 100% cpu, but the system was still useable. This was 2.6.99 + Xorg-1.6.0_2 (from rawhide). I collected a few stacktraces, no idea how useful they are.
Created attachment 23923 [details] stacktraces
Looks like the driver is just waiting on the GPU in those traces, can you reliably cause any problems here? Or do you just see occasional spikes (which could be normal)?
Just experienced this again, this time with XAA (was running UXA last time). When I get those stack-traces, XOrg consumes 100% CPU on one core, although no graphic stuff is going on. Looking at "top", Xorg in this case is the only process consuming a lot cycles, so I can't imagine this is just a misbehaving X-client feeding the server with tons of input. Also the stack-traces indicate that there's nothing else going on in the server. Just restarting X doesn't help here, I have to reboot to make it working again.
sorry, just saw that both was on UXA - The wish for XAA seems to be ignored.
Were the stack traces the same?
yes, were the same. unfourtunatly I did not find a way to trigger the problem - it happend to me always by accident.
Another thing to capture would be the kernel stack trace from the process. You should be able to get it by using sysrq-t and capturing dmesg (or echo t > /proc/sysrq-trigger).
ok, when it happens again I try to get some kernel stack traces.
Created attachment 24414 [details] kernel stacktraces
Created attachment 24415 [details] more xorg-stacktraces
VT switching also doesn't seem to work in this situation, thanks for the sysrq-trigger tip :)
restarting the X-server doesn't seem to help - both issues persist. VT switching is still broken, and the x-server process keeps spinning.
I still don't have any ideas on this one. Would it be possible for you to get sysprof output during the CPU spike?
Thanks for looking at that issue. I can trigger the spinning by switching between runleven 3 and runlevel 5 a few times. I don't even need to log in with kdm. Another thing I noted: When I kill X (Ctrl+Alt+Backspace) I can't type on the VT console anymore. Each keypress results in multiple garbage characters. top says X consumes about 10% of one CPU in userspace, and 90% in kernel-space. OProfile gave me: samples % image name app name symbol name 34 21.7949 vmlinux vmlinux read_hpet 9 5.7692 vmlinux vmlinux acpi_idle_do_entry 6 3.8462 vmlinux vmlinux acpi_idle_enter_bm 5 3.2051 vmlinux vmlinux system_call 4 2.5641 vmlinux vmlinux __ticket_spin_unlock 4 2.5641 vmlinux vmlinux acpi_os_read_port 4 2.5641 vmlinux vmlinux do_select 4 2.5641 vmlinux vmlinux native_flush_tlb_single
Created attachment 25442 [details] sysprof profile of spinning xorg
> --- Comment #15 from Clemens Eisserer <linuxhippy@gmail.com> > 2009-05-04 15:56:15 PST --- Created an attachment (id=25442) > --> (http://bugs.freedesktop.org/attachment.cgi?id=25442) > sysprof profile of spinning xorg Any chance you could install debug symbols for your kernel & X server so we can see more details here? At first blush it appears something is stuck calling gettimeofday in a tight loop.
I had/have debug symbols installed for both, Xorg and the kernel - the symbol names do show up in the profile.
do you know how I can tell sysprof where to look for vmlinux + kernel-module debug info? I only found that information for oprofile, but I don't understand its call-graphis. Thanks, Clemens
No, sorry. It usually "just works" for me...
Have you seen this recently Clemens? Or do you think it's fixed?
Haven't seen this for a very long time, but today experienced it with 2.6.31rc5 + intel-2.8 (that stuff deployed with fedora rawhide 11.91, 20080818). Attached are user/kernel-space stacktraces.
Created attachment 28747 [details] kernel stacktraces of 2.6.31rc5
Created attachment 28748 [details] userspace stack traces of 2.8
This sounds like (but I didn't spot any collobrating evidence in your profile due to lack of kernel symbols) the fence thrashing bug fixed by: commit a09ba7faf75fa4b21980d81de8e5f3d5c0785ccf Author: Eric Anholt <eric@anholt.net> Date: Sat Aug 29 12:49:51 2009 -0700 drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU. The lack of a proper LRU was partially worked around by taking the fence from the object containing the oldest seqno. But if there are multiple objects inactive, then they don't have seqnos and the first fence reg among them would be chosen. If you were trying to copy data between two mappings, this could result in each page fault stealing the fence from the other argument, and your application hanging.
Closing due to lack of activity, probably fixed though.
haven't seen this for a long time, guess its fixed :)
Cool, see we fix bugs! We just don't always correlate the fixes back to bug reports. :p
I've just seen it again - on kernel-2.6.31.1 + intel-2.9.0 when I switched from runleven 5->3->5. As in the reports before, Xorg spins using one core but everything else seems to work. I noticed it because my laptop lost battery-charge quite fast and the fan started to blow.
Can you capture a sysprof of the new spinning? Is it the same as before?
Created attachment 30412 [details] sysprof log with 2.9 + 2.6.31.1
Created attachment 30413 [details] pstack-output from spinning Xorg, 2.9 + 2.6.31.1
Sorry missed the update. Can you get sysprof output with kernel symbols? Usually your distro has a kernel debug package that will provide them. Most of the time is definitely in the kernel in your sysprof output, but we can't tell where.
Timeout. Hope this isn't still occurring. Maybe Chris can take a look if it is.
Please reopen, just happend again with kernel-2.6.32.8-58.fc12.i686 and intel-2.9.1 All I had to do was: - Boot into runlevel 5 - Log into KDE - Logout - "init 3" in VT - "init 5" in VT Xorg was spinning again. Thanks to the kernel's new profiling framework I am now able to provide users+kernel profile, as attached.
Created attachment 33600 [details] sysprof profile (user+kernelspace)
Woo driver funkiness.
Any ideas Chris?
It looks like the X server has been sent into a spin around select(). Possibly an invalid fd in one its fdsets? strace would confirm that it is continuously calling select, and test the hypothesis that select is returning an error. I will try to reproduce this locally later, though it might be machine dependent - at this point there is nothing to indicate the cause of the spin.
still spins with kernel-2.6.33.5 + intel-2.11. I'll try to strace the server some time soon.
Any luck Clemens? This issue is just so bizarre that I am curious to know what the cause is.
This is a likely culprit in conjunction with a spin in select: commit c882f6a22a862c1664c375e05e5e6fc4bdb04edb Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Aug 18 10:21:22 2010 +0100 Move registration of vsync fd from pre-init to screen-init Marty Jack reported an issue he found where the page-flipping handler was being lost on server reset. This results in the swap completion notification being lost, with the sporadic hang of full screen applications like Compiz, flash and even glxgears! Fixes: Bug 29584 - Server in compute loop https://bugs.freedesktop.org/show_bug.cgi?id=29584 There are also several possibly related bugs with similar symptoms, i.e. OpenGL applications hanging on missed swap notifications. Reported-by: Marty Jack <martyj19@comcast.net> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Keith Packard <keithp@keithp.com>
Created attachment 38462 [details] [review] Kill -EIO from tcflush And this patch from Adam Jackson seems more relevant.
The potential EIO spin on vt-switch fits the bug description and profiles, so presuming fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.