Bug 26099

Summary: test many simultaneous connections
Product: Telepathy Reporter: Simon McVittie <smcv>
Component: fargoAssignee: David Laban <david.laban>
Status: RESOLVED FIXED QA Contact: Simon McVittie <smcv>
Severity: enhancement    
Priority: high    
Version: git master   
Hardware: Other   
OS: All   
Whiteboard: milestone4.9; done=20h; est=30h
i915 platform: i915 features:
Bug Depends on: 26142    
Bug Blocks: 26257, 26277    

Description Simon McVittie 2010-01-18 07:28:14 UTC
In theory Fargo supports lots of simultaneous connections/calls, but this hasn't been verified.
Comment 1 Simon McVittie 2010-01-27 09:49:32 UTC
Using this bug to represent the actual testing, est=5h. I'll open another bug for using a pool of CMs, which might be necessary for scalability or robustness, or might be descoped.
Comment 2 Simon McVittie 2010-02-03 09:32:50 UTC
This is taking considerably longer than I'd hoped; reasonable numbers of parallel connections end up hitting what appear to be race conditions in the stress-test script.
Comment 3 Simon McVittie 2010-02-08 13:39:26 UTC
done+=5h est+=5h... little progress to show for today's work on this
Comment 4 Simon McVittie 2010-02-08 13:48:06 UTC
The manual loopback test has some problems. Since it's the building block for the multi-loopback stress test, I'm trying to make it more stable.

One symptom is as follows:

* run the manual loopback test: it passes
* delete everything from the parameters table in the database
* run the manual loopback test (it registers its user pair, which is needed this time): it fails
* run the manual loopback test again (it registers its user pair, which is a no-op this time): it passes

It appears that the receiver doesn't get the incoming channel, although there's currently no debug of NewChannels so it's hard to get a good idea of where the failure is. My first job for tomorrow is to add more debug.

A note for anyone else hacking on this: it turns out that

* telepathy-sofiasip doesn't reliably die with with-session-bus.sh
* openser can get confused in the aftermath of a failed test (?)

so it's worth killing all related processes and restarting openser if in doubt.
Comment 5 Simon McVittie 2010-02-09 11:47:22 UTC
Still not making significant progress here, unfortunately...

Today's progress:

* Discovered that for "a while" (~ a minute) after startup, the loopback test script doesn't work; possibly ejabberd and Fargo handshaking
* Many test runs with 10-25 users
* Improved debug logging
* Disabled the registration part of the test, and instead registered users in the database directly, to get rid of a point of fragility
* Disabled reconnecting (3 calls x 3 connections) and just did 10 calls on a single connection, to get rid of another point of fragility
* The usual failure mode seems to be that session-initiate isn't sent to the receiving XMPP client, possibly because SetRemoteCodecs hasn't been emitted by the receiver's telepathy-sofiasip instance (?)
Comment 6 Simon McVittie 2010-02-09 11:49:33 UTC
Unaddressed review complaints for smcv/stress2 for reference:

18:39 < alsuren> +echo "delete from parameters;" | psql ${DB:-tpfargo} -- could 
                 you just make DB be $4 or tpfargo when you set it at the top 
                 of the file?

18:34 < alsuren> +        (the script will exit successfully after//+        
                 CALLS_PER_CONNECTION * CONNECTIONS calls) -- do you mean that 
                 * (n_max - n_min)?
Comment 7 David Laban 2010-02-12 10:50:43 UTC
http://git.collabora.co.uk/?p=user/smcv/telepathy-fargo.git;a=commitdiff;h=86cea12ed22104b4cc521024613affb978e852a5 -- would be clearer if you didn't have to glue DelayedCall objects onto WaitState object, or if you had helpers to do it for you.

otherwise, stress3 and stress4++
Comment 8 Simon McVittie 2010-02-15 04:06:31 UTC
Swapping assignee, David's better at profiling Python than I am.
Comment 9 Simon McVittie 2010-02-15 04:21:15 UTC
Merging stress3 and stress4 with one unaddressed review complaint, which I think is addressed by <http://git.collabora.co.uk/?p=user/smcv/telepathy-fargo.git;a=shortlog;h=refs/heads/waiter>; please review?
Comment 10 Simon McVittie 2010-02-15 04:32:07 UTC
Merged smcv/waiter too.
Comment 11 David Laban 2010-03-22 05:41:38 UTC
Seems to cope easily with a sustained rate of 1 call setup/teardown per second, now that we fixed http://bugs.freedesktop.org/show_bug.cgi?id=26698 

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.