5207R/5208R/5209R YUM updates

Posted by: mstauber Category: General

We just published several updates that should address various stability aspects of CCEd.

The following updates were just released for 5207R, 5208R and 5209R:

  • sausalito-cce-server
  • sausalito-cce-client
  • base-alpine
  • base-swupdate
  • base-vsite (5209R only)

I'd also like to apologise for the problems we've created with a well intended new feature between Christmas and New Year: Back then we released a new extra (base-memcache) that was supposed to speed up CCEd. Which it did. But as things went: It caused issues and problems such as:

  • Sporadic Active Monitor Emails in Spanish language
  • Sporadic GUI login problems with weird error messages
  • Erratic behaviour of cronjobs that interface with CCEd
  • Runaway cced child processes
  • Expired Autoresponders started to auto-respond again.
  • Active Monitor emails to non-existing accounts.
  • Other weird issues (too many to name)

We rolled out six or seven memcache related fixes out since New Years eve. Including an update that disabled Memcache entirely. Still: The problems wouldn't go away as CCEd (even with deactivated Memcache) would behave erratic. Just less erratic than with Memcache enabled, but we certainly can't have that either.

So I just published updates that uninstall base-memcache and bring sausalito-cce-* back to the same state as it was before the Memcache feature was added.

This should end all of the mysterious problems that we have seen cropping up in the last 2-3 weeks. And I thoroughly and sincerely apologise for these issues. This is a lesson learned and we won't have that happen again.

How did this all happen? Well, our intention was good. We wanted to speed up access to the CODB database that CCEd (the GUI backend) uses. Both CCEd and CODB are some pieces of rock solid technology. But they're not exactly the fastest by design. Anyone familiar with database design will know that lack of proper indexing slows down all FIND requests. Because then you have to loop over all relevant database entries to find the one(s) you're looking for. The bigger the database gets, the longer it takes for a FIND query to finish.

The BlueOnyx GUI uses a lot of FIND requests. On some pages more, on some pages less. Any SET or GET transaction is usually only done after a FIND request has identified the database Object(s) that we need to access. Therefore: Speeding up FIND transactions by providing proper database indexing would speed things up considerably.

The Memcache feature was our attempt to achieve this speedup. For that purpose CCEd got extended with methods that would use the service "memcached" to create and maintain an index of database Objects and the keys they contained. Any FIND request would first hit the cache, which then (very speedily) returned the ID's of the Objects we were looking for.

Sadly something did not go exactly right. We are still trying to identify the origin of the fault. But the symptoms were like this: During one time or another CCEd would enter a fault state where a GET request to a valid Object would return an error message such as "301 UNKNOWN CLASS" even though the Object was valid. Most typically this happened with the "System" Object, which contains configurational data such as language settings and the general state and configuration of the server.

Any GUI page and any GUI script, handler or constructor depends on the presence and availability of the "System" Object. If that's inaccessible, then all hell breaks loose and you see error messages and very erratic behavior.

Most unfortunately our CCEd even exhibited these problems if Memcache had been disabled in the GUI. It just happened less frequent than with Memcache enabled. This was as unexpected as it was unwelcome.

To address these issues we just rolled back almost all Memcache related changes:

CCEd got replaced with the same code that we were using before the Memcache feature got added. Additionally the installation of this updated sausalito-cce RPM will also remove the base-memcache module that provided the GUI integration of Memcache. Because with that feature being removed we don't want to have the GUI pages for it remain behind either.

Where we'll be going from here:

In the meantime we have been contemplating ideas, concepts and general design changes that will help us to prevent these problems (and similar ones) in the future.

Among the problems we identified is the need for proper indexing of the CODB database to speed up FIND requests. We do have some ideas how we can achieve this without breaking CCEd. However, this will take some time to code and naturally we'll test it properly before we even consider a release.

Secondly we identified (and fixed) several "speed bumps" in existing GUI pages and libraries. In the last couple of days I released an updated base-alpine which reduces the amount of redundant FIND and GET requests on all GUI pages by a factor of 5-6. In terms of speed increase (even without Memcache) that boils down to 0.5-1.0 seconds of faster processing and page loading on an average server.

We also identified other areas where slightly different database layout or structuring would be beneficial and found pages that make redundant FIND or GET requests which we can subsequently eliminate for speed gains.

But certainly we're not again going to release any drastic changes without knowing full well which implications that will have on production servers and "real world" scenarios.

Reliable restarts of CCEd during YUM updates:

Among the fixes released today is an updated base-swupdate module. This tackles one long-standing issue that has plagued us for: Certain RPMs need to restart (or rehash) CCEd upon YUM updates. We need to do that to push out configurational changes or minor and major modifications of the CODB database schemas.

Pretty much any modest feature change needs a CCEd restart or a CCEd rehash (which is a fast restart of CCEd).

The mechanism we used for that was a carry over from the Cobalt Network times. It had a few conceptual problems and didn't work reliably enough. Even less so if the YUM update had been issued through the GUI.

To address this I wrote a YUM plugin that now restarts or rehashes CCEd at the end of a "yum update" or "yum install" if the GUI RPM's require it. That way we only restart or rehash CCEd once at the end of a "yum update" (if at all) and do it with a much greater reliability and certainty. Because a lot of the support cases and problem reports on the BlueOnyx list (or in tickets or by email) were from people who had issues because a mandatory CCEd restart had not been performed after a YUM update. The new YUM plugin will once and for all solve that particular problem.

Pending support tickets and support request by email:

To anyone who has an open support ticket or an unanswered email: My sincere apologies. I'll get to them as soon as I can. But as you can imagine: Fixing these stability issues took a lot of time and energy. And generated a flood of support request as well. Due to that both Greg and I are totally backlogged with tickets. Working through them will take some time, but we will get back to you as quickly as we can.

Thank you for your patience!

Jan 14, 2016 Category: General Posted by: mstauber
Previous page: Development Next page: Mailing List