I'm working with a complex system that has a couple of decades of history behind it. It used to be a client/server dispatching app, but it's gotten more complicated. Originally, each customer had his own instance, running on his own servers. Now, we have some customers who are still running in this mode, but we have some who are running in a Software as a Service mode - where all the applications are running on our servers. And we've added web interfaces, and we have hundreds of customers who access their systems solely through the web.
As it currently exists, each installation of the system consists of:
We have upwards of a dozen different installations, each running from a one to several hundred customers. (And from a handful to several hundred users per customer.)
The older background apps are written in unmanaged C++, the newer in C#. The client apps are written in VB6, running against an COM object written in unmanaged C++. The websites and services are ASP.NET and ASP.NET/MVC written in C#.
Clearly, it's gotten quite complicated, over the years, with a lot of parts and a lot of inter-relationships. That it still works, and works well, surprises me. And makes me think we didn't do too bad, when we first architected the beginnings, 20 years ago. But...
At this point, our biggest problem is the effort needed to install updates and upgrades. Much of the system is decoupled, so we can change one communications program, or fix a web page, etc., without much difficulty. But any change to the database schema pretty much mandates a system-wide change. And that takes significant time, and affects many customers, and involves significant risk. So the implementation of fixes get delayed, and that makes the risk when we do do an upgrade even higher, which results in more delay, and it's generally hurting our responsiveness.
So, I'm looking for advice as to architectural changes we might make, that would make upgrades less risky and less expensive.
In my ideal world, we'd never upgrade a running installation, we'd install an upgrade in parallel, test it, and once we were confident that it was working, we'd move customers from the old system to the new one, at first one at a time, and then later in bulk as we grew confident. And where we could roll a customer back to the old system, if things didn't work. But I see some problems with that:
What we have is working, but it's not working well. And I was hoping for some advice.
I'm not looking for answers, exactly, but more I'm looking for ideas on where to look. Anyone have any ideas on where I could find information on how to deal with structuring systems of this scale?
You have mentioned that the subsystems are isolated, so i guess it is not very problematic to replace/upgrade the components. The major pain point seems to be the db changes. It will be so because you are using a shared db model in a SAAS environment. The recommended DB models in the order of highly flexible to least flexible for a SAAS application are:
Separate databases for each customer, Shared database but different schema, Shared database Shared schema.
And the issue is due to the third model. It is least expensive but highly rigid.
To resolve this pain, each customer can be served by a different db or schema instance. This can be done for few customers at a time. Example, for customer A, create a different db/schema based on the choice. Export all the records for that customer id and populate into the new schema. Start routing the new customer to the new DB and you still have the old DB to ball back to.
In general, for web services it is always better to
use a router to invoke a real webservice.
I think, it might already be in place. So a customer actually calls the router service which will route to appropriate web service for the customer. It will be as simple as changing the configuration at the router to fall back or route to different versions or installations of the services. The above strategy can in some way be applied to different sub systems.
Jeff, I noticed you are in Minneapolis, so am I. I will guess the initials DR or BI based on what you describe but its a swag only. With experience in having managed 75+ applications portfolio for a Fortune 50 company I know how crazy things get over 10/20 years as requirements continue to evolve and technologies leapfrog the original understanding of what was considered possible.
For your multi-tenant SaaS architecture, I am not sure if you will get any silver bullet answers here, but based on my experience in a startup where we are building a multi-tenant saas web applications platform (horizontal platform that takes care of 60% of the needs, leaving 40% to be solved by vertical apps) here are my words of wisdom:
I know it will be difficult but try and achieve an absolute separation between your application capabilities data and tenant data into two different schema/databases or whatever. This will allow you to significantly reduce roll-out times and testing requirements when changes are related to core application capabilities and has no impact on the customer or user data. Basically, you will have to put your team through excruciating pain to identify and peel off all tables that strictly define application capabilities (application metadata). Once this is achieved, then you will have to refine all your development and testing processes to separate tasks that affect core application platform capabilities vs. vertical application capabilities vs. customer data-related. That's what we do and for us rolling out upgrades to the core application platform is a matter of hours if not less.
Separate...
Honestly, without a thorough analysis of existing landscape, I can't tell you how or if this can be achieved. Good luck!