TpIntSet currently allocates O(h) memory where h is the highest integer in the set. In particular, a set containing only the integer 0xFFFFFFFF will try to allocate and zero-fill 512M of memory, then set one bit to 1.
This is fine for service-side use, where we control the allocation policy, but it's no good for client-side use, where we want the client to be able to cope gracefully with arbitrary handle values.
It should allocate roughly O(n) memory, where n is the number of integers in the set, for any guint values in the set. Sjoerd suggests that a page-table would be an appropriate data structure.
The operations that the existing API requires us to support are:
* Test a guint for presence/absence (this should be relatively fast)
* Add a guint to the set
* Remove a guint from the set
* Count the guints in the set (performance not very important)
* Iterate over the set
* Given an arbitrary guint x, find the smallest guint larger than x that is in the set
It might be acceptable for the implementation to constrain the integers in the set to be no larger than 0xFFFFFFFF (and g_return_if_fail if they aren't). This is technically an ABI break, but in practice all we ever use them for in Telepathy is storing 32-bit handles, and this constraint is already effectively present (due to prohibitive memory requirements whenever it's not satisfied).
Another option is to divide up the malloced bitfield into 4k chunks, only zero chunks in which a bit has been set. Maintain a bitfield for which chunks are currently in use (that would be max 16k). Align the bitfield to a page boundary when (re)allocing if it grows larger than a page for extra points.
The kernel should then nicely take care of paging for you, I think. The actual operations should also be able to stay very light.
This is more important now that we're seriously thinking about Bug #23155. This single patch:
changes TpIntSet to be implemented as a hash from guint to guint32 (the 5 lowest bits are represented by a bitfield in the value, and the rest go in the key).
On 64-bit platforms we could in principle use a hash from guint to 64-bit int, and put the 6 lowest bits in the value; all the code is there, except that count_bits32() needs replacing with a version with 64-bit magic numbers (I didn't really want to waste time researching that right now).
Iteration will be somewhat slower now due to keeping the (undocumented!) rule that iteration over an int set is ordered, so I also added a TpIntSetFastIter which has a more GHashTable-like API and iterates out-of-order.
A bit scary but pretty smart :) The logic and the code looks fine to me.
Thanks, will be in 0.11.6.