Yang Sering Berkunjung
Cari Blog Ini
Entri Populer
-
DATA CENTER Definisi Data Center ΓΌ . . . komponen penting dari infrastruktur yang mendukung Internet dan perdagangan digital Juga sek...
-
Oleh : TJUK SUDARSONO Instruktur Transportasi Udara & Praktisi Penerbangan Memahami pentingnya Emergency Operation Center (EOC) atau Pus...
-
RAMUAN VIRAL "Rahasia Dibalik Konten Viral" Saya akan beritahu Anda sebuah "rahasia"... Rahasia bagaiman...
-
TAK ada pesta dalam pernikahannya. Tak pula ada orang tua, keluarga, atau kerabat yang menyaksikan momen sakral itu. Hanya ada mereka ber...
-
ElasticSearch merupakan search engine full-text yang bisa diakses melalui RESTful API. Search engine ini berorientasi dokumen (hampir sep...
-
Identifikasi dan analisa Hazard , serta penilaian dari resiko yang akan ditimbulkan oleh Hazard tersebut, merupakan suatu metoda efektif d...
-
Diamabil dari bukunya Prof Rhenald Kasali yang judul nya DISRUPTION ada yang menarik untuk di ketahui disebutkan bahwa Akibat serangan d...
-
UNDANG-UNDANG NOMOR 1 TAHUN 2009 TENTANG PENERBANGAN dari sisi BANDAR UDARA Ditulis oleh DR.H.K.Martono SH LLM Senin, 19 Januari 2009 15:...
-
Sebelum nya kita kenalan dulu apa itu elasticsearch, elasticsearch adalah search engine full-text yang bisa diakses melalui RESTful API. E...
-
Soal soal psikotes ini saya dapat dari beberapa situs gratis, jadi yang mau download silahkan saja. Soal psikotes ini saya publish setetalah...
Senin, 29 Juni 2015
Multi-Tenant Data Architecture
June 2006
Frederick Chong, Gianpaolo Carraro, and Roger Wolter
Microsoft Corporation
Applies to:
Application Architecture
Software as a Service (SaaS)
Summary: The second article in our series about designing multi-tenant applications identifies three distinct approaches for creating data architectures. (25 printed pages)
Acknowledgements
Many thanks to Paul Henry for his help with technical writing.
For further reference, an ARCast is available:
ARCast - Software As A Service
Contents
Introduction
Three Approaches to Managing Multi-Tenant Data
Choosing an Approach
Realizing Multi-Tenant Data Architecture
Conclusion
Related Guidance
Feedback
Introduction
Trust, or the lack thereof, is the number one factor blocking the adoption of software as a service (SaaS). A case could be made that data is the most important asset of any business—data about products, customers, employees, suppliers, and more. And data, of course, is at the heart of SaaS. SaaS applications provide customers with centralized, network-based access to data with less overhead than is possible when using a locally-installed application. But in order to take advantage of the benefits of SaaS, an organization must surrender a level of control over its own data, trusting the SaaS vendor to keep it safe and away from prying eyes.
To earn this trust, one of the highest priorities for a prospective SaaS architect is creating a SaaS data architecture that is both robust and secure enough to satisfy tenants or clients who are concerned about surrendering control of vital business data to a third party, while also being efficient and cost-effective to administer and maintain.
This is the second article in our series about designing multi-tenant applications. The first article, Architecture Strategies for Catching the Long Tail, introduced the SaaS model at a high level and discussed its challenges and benefits. It is available on MSDN. Other articles in the series will focus on topics such as workflow and user interface design, overall security, and others.
In this article, we'll look at the continuum between isolated data and shared data, and identify three distinct approaches for creating data architectures that fall at different places along the continuum. Next, we'll explore some of the technical and business factors to consider when deciding which approach to use. Finally, we'll present design patterns for ensuring security, creating an extensible data model, and scaling the data infrastructure.
Three Approaches to Managing Multi-Tenant Data
The distinction between shared data and isolated data isn't binary. Instead, it's more of a continuum, with many variations that are possible between the two extremes.
Data architecture is an area in which the optimal degree of isolation for a SaaS application can vary significantly depending on technical and business considerations. Experienced data architects are used to considering a broad spectrum of choices when designing an architecture to meet a specific set of challenges, and SaaS is certainly no exception. We shall examine three broad approaches, each of which lies at a different location in the continuum between isolation and sharing.
Separate Databases
Storing tenant data in separate databases is the simplest approach to data isolation.
Figure 1. This approach uses a different database for each tenant
Computing resources and application code are generally shared between all the tenants on a server, but each tenant has its own set of data that remains logically isolated from data that belongs to all other tenants. Metadata associates each database with the correct tenant, and database security prevents any tenant from accidentally or maliciously accessing other tenants' data.
Giving each tenant its own database makes it easy to extend the application's data model (discussed later) to meet tenants' individual needs, and restoring a tenant's data from backups in the event of a failure is a relatively simple procedure. Unfortunately, this approach tends to lead to higher costs for maintaining equipment and backing up tenant data. Hardware costs are also higher than they are under alternative approaches, as the number of tenants that can be housed on a given database server is limited by the number of databases that the server can support. (Using autoclose to unload databases from memory when there are no active connections can make an application more scalable by increasing the number of databases each server can support.)
Separating tenant data into individual databases is the "premium" approach, and the relatively high hardware and maintenance requirements and costs make it appropriate for customers that are willing to pay extra for added security and customizability. For example, customers in fields such as banking or medical records management often have very strong data isolation requirements, and may not even consider an application that does not supply each tenant with its own individual database.
Shared Database, Separate Schemas
Another approach involves housing multiple tenants in the same database, with each tenant having its own set of tables that are grouped into a schema created specifically for the tenant.
Figure 2. In this approach each tenant has its own separate set of tables in a common database
When a customer first subscribes to the service, the provisioning subsystem creates a discrete set of tables for the tenant and associates it with the tenant's own schema. You can use the SQL CREATE command to create a schema and authorize a user account to access it. For example, in Microsoft SQL Server 2005:
CREATE SCHEMA ContosoSchema AUTHORIZATION Contoso
The application can then create and access tables within the tenant's schema using the SchemaName.TableName convention:
CREATE TABLE ContosoSchema.Resumes (EmployeeID int identity primary key,
Resume nvarchar(MAX))
After the schema is created, it is set as the default schema for the tenant account:
ALTER USER Contoso WITH DEFAULT_SCHEMA = ContosoSchema
A tenant account can access tables within its default schema by specifying just the table name, instead of using the SchemaName.TableName convention. This way, a single set of SQL statements can be created for all tenants, which each tenant can use to access its own data:
SELECT * FROM Resumes
Like the isolated approach, the separate-schema approach is relatively easy to implement, and tenants can extend the data model as easily as with the separate-database approach. (Tables are created from a standard default set, but once they are created they no longer need to conform to the default set, and tenants may add or modify columns and even tables as desired.) This approach offers a moderate degree of logical data isolation for security-conscious tenants, though not as much as a completely isolated system would, and can support a larger number of tenants per database server.
A significant drawback of the separate-schema approach is that tenant data is harder to restore in the event of a failure. If each tenant has its own database, restoring a single tenant's data means simply restoring the database from the most recent backup. With a separate-schema application, restoring the entire database would mean overwriting the data of every tenant on the same database with backup data, regardless of whether each one has experienced any loss or not. Therefore, to restore a single customer's data, the database administrator may have to restore the database to a temporary server, and then import the customer's tables into the production server—a complicated and potentially time-consuming task.
The separate schema approach is appropriate for applications that use a relatively small number of database tables, on the order of about 100 tables per tenant or fewer. This approach can typically accommodate more tenants per server than the separate-database approach can, so you can offer the application at a lower cost, as long as your customers will accept having their data co-located with that of other tenants.
Shared Database, Shared Schema
A third approach involves using the same database and the same set of tables to host multiple tenants' data. A given table can include records from multiple tenants stored in any order; a Tenant ID column associates every record with the appropriate tenant.
Figure 3. In this approach, all tenants share the same set of tables, and a Tenant ID associates each tenant with the rows that it owns
Of the three approaches explained here, the shared schema approach has the lowest hardware and backup costs, because it allows you to serve the largest number of tenants per database server. However, because multiple tenants share the same database tables, this approach may incur additional development effort in the area of security, to ensure that tenants can never access other tenants' data, even in the event of unexpected bugs or attacks.
The procedure for restoring data for a tenant is similar to that for the shared-schema approach, with the additional complication that individual rows in the production database must be deleted and then reinserted from the temporary database. If there are a very large number of rows in the affected tables, this can cause performance to suffer noticeably for all the tenants that the database serves.
The shared-schema approach is appropriate when it is important that the application be capable of serving a large number of tenants with a small number of servers, and prospective customers are willing to surrender data isolation in exchange for the lower costs that this approach makes possible.
Choosing an Approach
Each of the three approaches described above offers its own set of benefits and tradeoffs that make it an appropriate model to follow in some cases and not in others, as determined by a number of business and technical considerations. Some of these considerations are listed below.
Economic Considerations
Applications optimized for a shared approach tend to require a larger development effort than applications designed using a more isolated approach (because of the relative complexity of developing a shared architecture), resulting in higher initial costs. Because they can support more tenants per server, however, their ongoing operational costs tend to be lower.
Figure 4. Cost over time for a hypothetical pair of SaaS applications; one uses a more isolated approach, while the other uses a more shared approach
Your development effort can be constrained by business and economic factors, which can influence your choice of approach. The shared schema approach can end up saving you money over the long run, but it does require a larger initial development effort before it can start producing revenue. If you are unable to fund a development effort of the size necessary to build a shared schema application, or if you need to bring your application to market more quickly than a large-scale development effort would allow, you may have to consider a more isolated approach.
Security Considerations
As your application will store sensitive tenant data, prospective customers will have high expectations about security, and your service level agreements (SLAs) will need to provide strong data safety guarantees. A common misconception holds that only physical isolation can provide an appropriate level of security. In fact, data stored using a shared approach can also provide strong data safety, but requires the use of more sophisticated design patterns.
Tenant Considerations
The number, nature, and needs of the tenants you expect to serve all affect your data architecture decision in different ways. Some of the following questions may bias you toward a more isolated approach, while others may bias you toward a more shared approach.
How many prospective tenants do you expect to target? You may be nowhere near being able to estimate prospective use with authority, but think in terms of orders of magnitude: are you building an application for hundreds of tenants? Thousands? Tens of thousands? More? The larger you expect your tenant base to be, the more likely you will want to consider a more shared approach.
How much storage space do you expect the average tenant's data to occupy? If you expect some or all tenants to store very large amounts of data, the separate-database approach is probably best. (Indeed, data storage requirements may force you to adopt a separate-database model anyway. If so, it will be much easier to design the application that way from the beginning than to move to a separate-database approach later on.)
How many concurrent end users do you expect the average tenant to support? The larger the number, the more appropriate a more isolated approach will be to meet end-user requirements.
Do you expect to offer any per-tenant value-added services, such as per-tenant backup and restore capability? Such services are easier to offer through a more isolated approach.
Figure 5. Tenant-related factors and how they affect "isolated versus shared" data architecture decisions
Regulatory Considerations
Companies, organizations, and governments are often subject to regulatory law that can affect their security and record storage needs. Investigate the regulatory environments that your prospective customers occupy in the markets in which you expect to operate, and determine whether they present any considerations that will affect your decision.
Skill Set Considerations
Designing single-instance, multi-tenant architecture is still a very new skill, so subject matter expertise can be hard to come by. If your architects and support staff do not have a great deal of experience building SaaS applications, they will need to acquire the necessary knowledge, or you will have to hire people that already have it. In some cases, a more isolated approach may allow your staff to leverage more of its existing knowledge of traditional software development than a more shared approach would.
Realizing Multi-Tenant Data Architecture
The remainder of this article details a number of patterns that can help you plan and build your SaaS application. As we discussed in our introductory article, a well-designed SaaS application is distinguished by three qualities: scalability, configurability, and multi-tenant efficiency. The table below lists the patterns appropriate for each of the three approaches, divided into sections representing these three qualities.
Optimizing for multi-tenant efficiency in a shared environment must not compromise the level of security safeguarding data access. The security patterns listed below demonstrate how you can design an application with "virtual isolation" through mechanisms such as permissions, SQL views, and encryption.
Configurability allows SaaS tenants to alter the way the application appears and behaves without requiring a separate application instance for each individual tenant. The extensibility patterns describe possible ways you can implement a data model that tenants can extend and configure individually to meet their needs.
The approach you choose for your SaaS application's data architecture will affect the options available to you for scaling it to accommodate more tenants or heavier usage. The scalability patterns address the different challenges posed by scaling shared databases and dedicated databases.
Table 1. Appropriate Patterns for SaaS Application
Approach Security Patterns Extensibility Patterns Scalability Patterns
Separate Databases
Trusted Database Connections
Secure Database Tables
Tenant Data Encryption
Custom Columns
Single Tenant Scaleout
Shared Database, Separate Schemas
Trusted Database Connections
Secure Database Tables
Tenant Data Encryption
Custom Columns
Tenant-Based Horizontal Partitioning
Shared Database, Shared Schema
Trusted Database Connections
Tenant View Filter
Tenant Data Encryption
Preallocated Fields
Name-Value Pairs
Tenant-Based Horizontal Partitioning
Security Patterns
Building adequate security into every aspect of the application is a paramount task for any SaaS architect. Promoting software as a service basically means asking potential customers to relinquish some control of their business data. Depending on the application, this can include extremely sensitive information about finances, trade secrets, employee data, and more. A secure SaaS application is one that provides defense in depth, using multiple defense levels that complement one another to provide data protection in different ways, under different circumstances, against both internal and external threats.
Building security into a SaaS application means looking at the application on different levels and thinking about where the risks lie and how to address them. The security patterns discussed in this section rely on three underlying patterns to provide the right kinds of security in the right places:
Filtering: Using an intermediary layer between a tenant and a data source that acts like a sieve, making it appear to the tenant as though its data is the only data in the database.
Permissions: Using access control lists (ACLs) to determine who can access data in the application and what they can do with it.
Encryption: Obscuring every tenant's critical data so that it will remain inaccessible to unauthorized parties even if they come into possession of it.
Keep these patterns in mind as you read the rest of this section.
Trusted Database Connections
In a multi-tier application environment application architects traditionally use two methods to secure access to data stored in databases: impersonation, and a trusted subsystem account.
With the impersonation access method, the database is set up to allow individual users to access different tables, views, queries, stored procedures, and other database objects. When an end-user performs an action that directly or indirectly requires a call to a database, the application presents itself to the database as that user, literally impersonating the user for the purposes of accessing the database. (In technical terms, the application employs the user's security context). A mechanism such as Kerberos delegation can be used to allow the application process to connect to the database on behalf of the user.
Figure 6. An application connects to a database using impersonation
With the trusted subsystem access method, the application always connects to the database using its own application process identity, independent of the identity of the user; the server then grants the application access to the database objects that the application can read or manipulate. Any additional security must be implemented within the application itself to prevent individual end users from accessing any database objects that should not be exposed to them. This approach makes security management easier, eliminating the need to configure access to database objects on a per-user basis, but it means giving up the ability to secure database objects for individual users.
Figure 7. An application connects to a database as a trusted subsystem
In a SaaS application, the concept of "users" is a bit more complicated than in traditional applications, because of the distinction between a tenant and an end user. The tenant is an organization that uses the application to access its own data store, which is logically isolated from data stores belonging to any other tenants. Each tenant grants access to the application to one or more end users, allowing them to access some portion of the tenant's data using end user accounts controlled by the tenant.
In this scenario, you can use a hybrid approach to data access that combines aspects of both the impersonation and trusted subsystem access methods. This allows you to take advantage of the database server's native security mechanisms to enforce the maximum logical isolation of tenant data without creating an unworkably complex security model.
Figure 8. A SaaS application connects to a database using a combination of the impersonation and trusted subsystem approaches
This approach involves creating a database access account for each tenant, and using ACLs to grant each of these tenant accounts access to the database objects the tenant is allowed to use. When an end user performs an action that directly or indirectly requires a call to a database, the application uses credentials associated with the tenant account, rather than credentials associated with the end user. (One way for the application to obtain the proper credentials is through impersonation, in conjunction with a credentialing system like Kerberos. A second approach is to use a security token service that returns an actual set of encrypted login credentials established for the tenant, that the application process can then submit to the database.) The database server does not distinguish between requests originating from different end users associated with the same tenant, and grants all such requests access to the tenant's data. Within the application itself, security code prevents end users from receiving and modifying any data that they are not entitled to access.
For example, consider an end user of a customer relations management (CRM) application who performs an operation that queries the database for customer records matching a certain string. The application submits the query to the database using the security context of the tenant, so instead of returning all of the matching records in the database, the query only retrieves the matching rows from the tables the tenant is allowed to access. So far, so good—but suppose the end user's role only allows her to access records of customers located within a certain geographic region. (For more information about roles, see the section "Authorization" in Architecture Strategies for Catching the Long Tail, the first article in this series.) The application must intercept the query results and only present the user with the records that she is entitled to see.
Secure Database Tables
To secure a database on the table level, use SQL's GRANT command to grant a tenant user account access to a table or other database object:
GRANT SELECT, UPDATE, INSERT, DELETE ON [TableName] FOR [UserName]
This adds the user account to the ACL for the table. If you use the hybrid approach to database access discussed earlier, in which end users are associated with the security contexts of their respective tenants, this only needs to be done once, during the tenant provisioning process; any end user accounts created by the tenant will be able to access the table.
This pattern is appropriate for use with the separate-database and separate-schema approaches. In the separate-database approach, you can isolate data by simply restricting access on a database-wide level to the tenant associated with that database, although you can also use this pattern on the table level to create another layer of security.
Tenant View Filter
SQL views can be used to grant individual tenants access to some of the rows in a given table, while preventing them from accessing other rows.
In SQL, a view is a virtual table defined by the results of a SELECT query. The resulting view can then be queried and used in stored procedures as if it were an actual database table. For example, the following SQL statement creates a view of a table called Employees, which has been filtered so that only the rows belonging to a single tenant are visible:
CREATE VIEW TenantEmployees AS
SELECT * FROM Employees WHERE TenantID = SUSER_SID()
This statement obtains the security identifier (SID) of the user account accessing the database (which, you'll recall, is an account belonging to the tenant, not the end user) and uses it to determine which rows should be included in the view. (The example assumes that the unique tenant ID number is identical to the tenant's SID. If this is not the case, one or more additional steps would be required to associate each tenant with the correct rows.) Each individual tenant's data access account would be granted permission to use the TenantEmployees view, but granted no permissions to the Employees source table itself. You can build queries and shared procedures to take advantage of views, which provides tenants with the appearance of data isolation even within a multi-tenant database.
This pattern is slightly more complex than the Secure Database Tables pattern, but is an appropriate way to secure tenant data in a shared-schema application, in which multiple tenants share the same set of tables.
Tenant Data Encryption
A way to further protect tenant data is by encrypting it within the database, so that data will remain secure even if it falls into the wrong hands.
Cryptographic methods are categorized as either symmetric or asymmetric. In symmetric cryptography, a key is generated that is used to encrypt and decrypt data. Data encrypted with a symmetric key can be decrypted with the same key. In asymmetric cryptography (also called public-key cryptography), two keys are used, designated the public key and the private key. Data that is encrypted with a given public key can only be decrypted with the corresponding private key, and vice versa. Generally, public keys are distributed to any and all parties interested in communicating with the key holder, while private keys are held secure. For example, if Alice wishes to send an encrypted message to Bob, she obtains Bob's public key through some agreed-upon means, and uses it to encrypt the message. The resulting encrypted message, or cyphertext, can only be decrypted by someone in possession of Bob's private key (in practice, this should only be Bob). This way, Bob never has to share his private key with Alice. To send a message to Bob using symmetric encryption, Alice would have to send the symmetric key separately—which runs the risk that the key might be intercepted by a third party during transmission.
Public-key cryptography requires significantly more computing power than symmetric cryptography; a strong key pair can take hundreds or even thousands of times as long to encrypt and decrypt data as a symmetric key of similar quality. For SaaS applications in which every piece of stored data is encrypted, the resulting processing overhead can render public-key cryptography infeasible as an overall solution. A better approach is to use a key wrapping system that combines the advantages of both systems.
With this approach, three keys are created for each tenant as part of the provisioning process: a symmetric key and an asymmetric key pair consisting of a public key and a private key. The more-efficient symmetric key is used to encrypt the tenant's critical data for storage. To add another layer of security, a public/private key pair is used to encrypt and decrypt the symmetric key, to keep it secure from any potential interlopers.
When an end user logs on, the application uses impersonation to access the database using the tenant's security context, which grants the application process access to the tenant's private key. The application (still impersonating the tenant, of course) can then use the tenant's private key to decrypt the tenant's symmetric key and use it to read and write data.
This is another example of the defense-in-depth principle in action. Accidental or malicious exposure of tenant data to other tenants—a nightmare scenario for the security-conscious SaaS provider—is prevented on multiple levels. The first line of defense, at the database level, prevents end users from accessing the private data of other tenants. If a bug or a virus in the database server were to cause an incorrect row to be delivered to the tenant, the encrypted contents of the row would be useless without access to the tenant's private key.
The importance of encryption increases the closer a SaaS application is to the "shared" end of the isolated/shared continuum. Encryption is especially important in situations involving high-value data or privacy concerns, or when multiple tenants share the same set of database tables.
Because you can't index encrypted columns, selecting which columns of which tables to encrypt involves making a tradeoff between data security and performance. Think about the uses and sensitivity of the various kinds of data in your data model when making decisions about encryption.
Extensibility Patterns
As designed, your application will naturally include a standard database setup, with default tables, fields, queries, and relationships that are appropriate to the nature of your solution. But different organizations have their own unique needs that a rigid, inextensible default data model won't be able to address. For example, one customer of a SaaS job-tracking system might have to store an externally generated classification code string with each record to fully integrate the system with their other processes. A different customer may have no need for a classification string field, but might require support for tracking a category ID number, an integer. Therefore, in many cases you will have to develop and implement a method by which customers can extend your default data model to meet their needs, without affecting the data model that other customers use.
Preallocated Fields
One way to make your data model extensible is to simply create a preset number of custom fields in every table you wish to allow tenants to extend.
Figure 9. A table with a preset collection of custom fields, labeled C1 through C3
In the previous figure, records from different customers are intermingled in a single table; a tenant ID field associates each record with an individual tenant. In addition to the standard set of fields, a number of custom fields are provided, and each customer can choose what to use these fields for and how data will be collected for them.
What about data types? You could simply choose a common data type for each custom field you create, but customers are likely to find this approach unnecessarily restrictive—what if a customer has a need for three additional string fields and you've only provided one string field, one integer field, and one boolean field? One way to provide this kind of flexibility is to use the string data type for every custom field, and use metadata to track the "real" data type the tenant wishes to use.
Figure 10. A custom field on a Web page, defined by an entry in a metadata table
In the example above, a tenant has used the application's extensibility features to add a text box called "Originating ZIP Code" to a data entry screen, and mapped the text box to a custom field called C1. When creating the text box, the tenant used validation logic (not shown) to require that the text box contain an integer. As implemented, this custom field is defined by a record in a metadata table that includes the tenant's unique ID number (1017), the label the tenant has chosen for the field ("Originating ZIP Code"), and the data type the tenant wants to use for the field ("int").
You can track field definitions for all of the application's custom fields in a single metadata table, or use a separate table for each custom field; for example, a "C1" table would define custom field C1 for every tenant that uses it, a "C2" table would do the same for custom field C2, and so on.
Figure 11. Storing field definitions in a single metadata table, top, and in separate tables for each custom field
The main advantage of using separate tables is that each field-specific table only contains rows for the tenants that use that field, which saves space in the database. (With the single-table approach, every tenant that uses at least one custom field gets a row in the combined table, with null fields representing available custom fields that the tenant has not used). The downside of using separate tables is that it increases the complexity of custom field operations, requiring you to use SQL JOIN statements to survey all of the custom field definitions for a single tenant.
When an end user types a quantity into the field and saves the record, the application casts the value for Originating ZIP Code to a string before creating or updating the record in the database. Whenever the application retrieves the record, it checks the metadata table for the data type to use and casts the value in the custom field back to its original type.
Name-Value Pairs
The Preallocated Fields pattern explained in the previous section is a simple way to provide a mechanism for tenants to extend and customize the application's data model. However, this approach has certain limitations. Deciding how many custom fields to provide in a given table involves making a tradeoff. Too few custom fields, and tenants will feel restricted and limited by the application; too many, and the database becomes sparse and wasteful, with many unused fields. In extreme cases, both can happen, with some tenants under-using the custom fields and others demanding even more.
One way to avoid these limitations is to allow customers to extend the data model arbitrarily, storing custom data in a separate table and using metadata to define labels and data types for each tenant's custom fields.
Figure 12. An extension table allows each tenant to define an arbitrary number of custom fields
Here, a metadata table stores important information about every custom field defined by every tenant, including the field's name (label) and data type. When an end user saves a record with a custom field, two things happen. First, the record itself is created or updated in the primary data table; values are saved for all of the predefined fields, but not the custom field. Instead, the application creates a unique identifier for the record and saves it in the Record ID field. Second, a new row is created in the extension table that contains the following pieces of information:
The ID of the associated record in the primary data table.
The extension ID associated with the correct custom field definition.
The value of the custom field in the record that's being saved, cast to a string.
This approach allows each tenant to create as many custom fields as necessary to meet its business needs. When the application retrieves a customer record, it performs a lookup in the extension table, selects all rows corresponding to the record ID, and returns a value for each custom field used. To associate these values with the correct custom fields and cast them to the correct data types, the application looks up the custom field information in metadata using the extension IDs associated with each value from the extension table.
This approach makes the data model arbitrarily extensible while retaining the cost benefits of using a shared database. The main disadvantage of this approach is that it adds a level of complexity for database functions, such as indexing, querying, and updating records. This is typically the best approach to take if you wish to use a shared database, but also anticipate that your customers will require a considerable degree of flexibility to extend the default data model.
Custom Columns
The simplest kind of extensible data model is one in which columns can be added to tenants' tables directly.
Figure 13. Custom rows can be added to a dedicated table without altering the data model for other tenants
This pattern is appropriate for separate-database or separate-schema applications, because each tenant has its own set of tables that can be modified independently of those belonging to any other clients. From a data model standpoint, this is the simplest of the three extensibility patterns, because it does not require you to track data extensions separately. On the application architecture side, though, this pattern can sometimes be more difficult to implement, because it allows tenants to vary the number of columns in a table. Even if the Custom Columns pattern is available to you, you may consider using a variation on the Preallocated Fields or Name-Value Pairs pattern to reduce development effort, allowing you to write application code that can assume a known and unchanging number of fields in each table.
Using Data Model Extensions
Whatever method you use to create an extensible data model, it must be paired with a mechanism for integrating the additional fields into the application's functionality. Any custom field implemented by a customer will require a corresponding modification to the business logic (so the application can use the custom data), the presentation logic (so that users have a way to enter the custom data as input and receive it as output), or both. The configuration interface you present to the customer should therefore provide ways to modify all three, preferably in an integrated fashion. (Providing mechanisms through which customers may modify the business logic and user interface will be addressed in a future article in this series.)
Scalability Patterns
Large-scale enterprise software is intended to be used by thousands of people simultaneously. If you have experience building enterprise applications of this sort, you know first-hand the challenges of creating a scalable architecture. For a SaaS application, scalability is even more important, because you'll have to support data belonging to all your customers. For independent software vendors (ISVs) accustomed to building on-premise enterprise software, supporting this kind of user base is like moving from the minor leagues to the majors: the rules may be familiar, but the game is played on an entirely different level. Instead of a widely deployed, business-critical enterprise application, you're really building an Internet-scale system that needs to actively support a user base potentially numbering in the millions.
Databases can be scaled up (by moving to a larger server that uses more powerful processors, more memory, and quicker disk drives) and scaled out (by partitioning a database onto multiple servers). Different strategies are appropriate when scaling a shared database versus scaling dedicated databases. (When developing a scaling strategy, it's important to distinguish between scaling your application (increasing the total workload the application can accommodate) and scaling your data (increasing your capacity for storing and working with data). This article focuses on scaling data specifically.)
Scaling Techniques
The two main tools to use when scaling out a database out are replication and partitioning. Replication involves copying all or part of a database to another location, and then keeping the copy or copies synchronized with the original. Single master replication, in which only the original (or replication master) can be written to, is much easier to manage than multi-master replication, in which some or all of the copies can be written to and some kind of synchronization mechanism is used to reconcile changes between different copies of the data.
Partitioning involves pruning subsets of the data from a database and moving the pruned data to other databases or other tables in the same database. You can partition a database by relocating whole tables, or by splitting one or more tables up into smaller tables horizontally or vertically. Horizontal partitioning means that the database is divided into two or more smaller databases using the same schema and structure, but with fewer rows in each table. Vertical partitioning means that one or more individual tables are divided into smaller tables with the same number of rows, but with each table containing a subset of the columns from the original. Replication and partitioning are often used in combination with one another when scaling databases.
Tenant-Based Horizontal Partitioning
A shared database should be scaled when it can no longer meet baseline performance metrics, as when too many users are trying to access the database concurrently or the size of the database is causing queries and updates to take too long to execute, or when operational maintenance tasks start to affect data availability.
The simplest way to scaleout a shared database is through horizontal (row-based) partitioning based on tenant ID. SaaS shared databases are well-suited to horizontal partitioning because each tenant has its own set of data, so you can easily target individual tenant data and move it.
However, don't assume, that if you have 100 tenants and want to partition the database five ways, you can simply count off 20 tenants at a time and move them. Different tenants can place radically different demands on an application, and it's important to plan carefully to avoid simply creating smaller, but still overtaxed, partitions while other partitions go underused.
If you're experiencing application performance problems because too many end users are accessing the database concurrently, consider partitioning the database to equalize the total number of active end-user accounts on each server. For example, if your existing database serves tenants A and B with 600 active users each, and tenants C, D, and E with 400 active users each, you could partition the database by moving tenants C, D, and E to a new server; both databases would then serve 1200 users each.
If you're experiencing problems relating to the size of the database, such as the length of time it takes to perform queries, a more effective partition method might be to target database size instead, assigning tenants to database servers in such a way as to roughly equalize the amount of data on each one.
The partitioning method you choose can have a significant impact on application development. Whichever method you choose, it's important that you can accurately survey and report on whatever metrics you intend to use to make partitioning decisions. Building support for monitoring into your application will help you get an accurate view of your tenants' usage patterns and needs. Also, it's likely that you'll need to repartition your data periodically, as your tenants evolve and change the way they work. Choose a partitioning strategy that you can execute when needed without unduly affecting production systems.
Occasionally, a tenant may have enough users or use enough data to justify moving the tenant to a dedicated database of its own. See the next section, "Single Tenant Scaleout," for help performing further scaling.
The Tenant-based Horizontal Partitioning pattern is appropriate for use with shared-schema applications, which impose some unusual constraints on the familiar task of scaling a database. It provides a way to scale a shared database while avoiding actions that will break the application or harm performance (like, for example, splitting a tenant's data across two or more servers inadvertently or unnecessarily).
Single Tenant Scaleout
If some or all tenants store and use a large amount of data, tenant databases may grow large enough to justify devoting an entire server to a single database that serves a single tenant. The scalability challenges in this scenario are similar to those facing architects of traditional single-tenant applications. With a large database on a dedicated server, scaling up is the easiest way to accommodate continued growth.
If the database continues to grow, eventually it will no longer be cost-effective to move it to a more powerful server, and you will have to scale out by partitioning the database on to one or more additional servers. Scaling out a dedicated database is different than scaling out a shared one. With a shared database, the most effective method of scaling involves moving entire sets of tenant data from one database to another, so the nature of the data model that you use isn't particularly relevant. When scaling a database that's dedicated to a single tenant, it becomes necessary to analyze the kinds of data that are being stored to determine the best approach.
The article Scaling Out SQL Server 2005 contains additional guidance and suggestions about analyzing data for scaling out. The article explains reference data, activity data, and resource data in detail, gives some guidelines for replicating and partitioning data, and explains some additional factors that affect scaleout. Some of the scaleout guidelines to consider:
Use replication to create read-only copies of data that doesn't change very often. Some kinds of data rarely or never change after the data is entered, such as part numbers or employee Social Security numbers. Other kinds of data are subject to active change for a defined period of time and then archived, such as purchase orders. These kinds of data are ideal candidates for one-way replication to any databases from which they might be referenced.
Location, location, location. Keep data close to other data that references it. ("Close" in this sense generally means logically proximate rather than physically proximate, although logical proximity often implies physical proximity as well.) Consider the relationships between different kinds of data when deciding whether to separate them, and use replication to distribute read-only copies of reference data among different databases when appropriate.
For example, if the act of retrieving a customer record routinely involves selecting the customer's recent purchase orders from a different table, try to keep the two tables in the same database, or use replication to create copies of appropriate kinds of data. Try to find natural divisions in the data that will minimize the amount of cross-database communication that needs to take place. For example, data associated with particular places can often be partitioned geographically.
Identify data that shouldn't be partitioned. Resource data, such as warehouse inventory levels, are usually poor candidates for replication or partitioning. Use scaleout techniques to move other data off the server, leaving your resource data more room to grow. If you have moved all the data you can and still experience problems, consider scaling up to a bigger server for the resource data.
Use single-master replication whenever possible. Synchronizing changes to multiple copies of the same data is difficult, so avoid using multi-master replication if you can. When replicated data must be changed, only allow changes to be written to the master copy.
This pattern can apply to all three approaches, but only comes into play when an individual tenant's data needs cannot be accommodated by a single server. With the separate-database approach, if tenants' data storage needs are modest, each individual server might host dozens of databases; in that case scaling a particular server involves simply moving one or more databases to a new server and modifying the application's metadata to reflect the new data location.
Conclusion
The design approaches and patterns we've discussed in this article should help you create the foundation layer of trust that's vital to the success of your SaaS application. Designing a SaaS data architecture that reconciles the competing benefits and demands of sharing and isolation isn't a trivial task, but these approaches and patterns should help you identify and resolve many of the critical questions you will face. The ideas and recommendations presented here differ in the details, but they all help you leverage the principles of configurability, scalability, and multi-tenant efficiency to design a secure and extensible data architecture for a SaaS application.
This article is by no means the last word in single-instance, multi-tenant data architecture. Later in this series, we'll look at ways you can help tenants put their data model extensions to good use through presentation and workflow customization.
Related Guidance
Developing Multi-tenant Applications for the Cloud on Windows Azure
Feedback
The authors gladly welcome your feedback about this paper. Please email all feedback to fredch@microsoft.com, gianpc@microsoft.com, or rwolter@microsoft.com. Thank you.
Architecture Strategies for Catching the Long Tail (Software-as-a-Service (SaaS)
Frederick Chong and Gianpaolo Carraro
Microsoft Corporation
April 2006
Applies to:
Application Architecture
Software-as-a-Service (SaaS)
Application Architecture
Software-as-a-Service (SaaS)
Summary: This provides an overview of the software-as-a-service (SaaS) model for software delivery, provides a high-level description of the architecture of a SaaS application, and discusses the challenges and benefits of developing and offering SaaS. (26 printed pages)
For further reference, an ARCast is available:
Contents
Introduction
What Is Software as a Service?
Thinking About Software as a Service
Changing the Business Model
The Three Attributes of a Single-Instance Multi-Tenant Architecture
The Software as a Service Maturity Model
Choosing a Maturity Level
High-Level Architecture
Metadata Services
Security Services
Authentication
Authorization
A Closer Look: Multi-Tenant Data Model
Scalability
Operational Structure
Shared Services
Monitoring
Conclusion
Acknowledgements
Feedback
What Is Software as a Service?
Thinking About Software as a Service
Changing the Business Model
The Three Attributes of a Single-Instance Multi-Tenant Architecture
The Software as a Service Maturity Model
Choosing a Maturity Level
High-Level Architecture
Metadata Services
Security Services
Authentication
Authorization
A Closer Look: Multi-Tenant Data Model
Scalability
Operational Structure
Shared Services
Monitoring
Conclusion
Acknowledgements
Feedback
Introduction
Software as a service. The words are on everyone's lips. The pages of software industry publications are full of articles about software as a service (SaaS)—articles that use words like "revolution" and "horizon" (as in, "on the…"). Everyone knows (or thinks they know) what it is, roughly, and everyone knows it's going to be big. Yet few people would say they can really define it, and even fewer know how to build it.
So, if SaaS holds such promise for the future of application delivery, why isn't there more guidance available to help people actually achieve it?
We believe that SaaS is going to have a major impact on the software industry, because software as a service will change the way people build, sell, buy, and use software. For this to happen, though, software vendors need resources and information about developing SaaS applications effectively.
This is the first in a series of papers from Microsoft dedicated to demystifying SaaS and providing practical, real-world guidance for architecting SaaS applications. This paper serves as an overview of SaaS, its challenges, and its benefits for those who are interested in offering SaaS. Future papers will explore many of these topics in detail.
This paper begins by asking just what software as a service is, exactly, and it explains the conceptual shifts that prospective SaaS vendors must experience in order to understand how it differs from traditional, on-premise software. Next, we'll look at the SaaS business model, to see how software as a service can be monetized in the real world.
Because this is an architectural paper, the largest section addresses the architecture of a SaaS application. We present a four-level maturity model that explains and puts into perspective some key attributes of SaaS: configurability, multi-tenant efficiency, and scalability. We'll examine the components of a high level SaaS architecture, and then take a closer look at a typical challenge the SaaS architect faces—that of providing a mechanism for extending the data model of a multi-tenant application.
Lastly, we'll take a brief look at some of the operational issues involved in supporting a SaaS application after deployment.
What Is Software as a Service?
Even today, the exact definition of software as a service (SaaS) is open to debate, and asking five people would probably result in five different definitions. Still, most experts would probably agree on a few fundamental principles that distinguish SaaS from traditional packaged software on the one hand, and simple websites on the other. Expressed most simply, software as a service can be characterized as follows:
"Software deployed as a hosted service and accessed over the Internet."
Take a moment to consider the implications of this definition. It doesn't prescribe any specific application architecture; it doesn't say anything about specific technologies or protocols; it doesn't draw a distinction between business-oriented and consumer-oriented services, or require specific business models. According to this definition, the key distinguishing features of software as a service are where the application code resides, and how they are deployed and accessed.
(Is this definition a little simplistic? In a word… yes. Later, we'll focus on some of the attributes that define and distinguish a well-designed, mature SaaS application.)
By this definition, SaaS includes a number of services and applications that you may not expect to find in this category. For example, consider Web-based e-mail services, such as Microsoft Hotmail. Although Hotmail might not be the first example that comes to mind when you think about SaaS, it meets all of the basic criteria: a vendor hosts all of the program logic and data, and provides end users with access to this data over the public Internet, through a Web-based user interface.
Moving from the general to specific, we can identify two major categories of software as a service:
- Line-of-business services, offered to enterprises and organizations of all sizes. Line-of-business services are often large, customizable business solutions aimed at facilitating business processes such as finances, supply-chain management, and customer relations. These services are typically sold to customers on a subscription-basis.
- Consumer-oriented services, offered to the general public. Consumer-oriented services are sometimes sold on a subscription-basis, but are often provided to consumers at no cost, and are supported by advertising.
This paper focuses on the architecture and business issues involved in developing line-of-business applications, and the concepts and examples herein are presented in that context. However, issues such as multi-tenant customization and extensibility, data scaling, and isolation issues also occur (and in fact tend to be easier to resolve) in the consumer space, so developers of consumer-oriented SaaS offerings may benefit from reading it as well.
Thinking About Software as a Service
Moving from offering on-premise software to offering software as a service requires software vendors to shift their thinking in three interrelated areas: in the business model, in the application architecture, and in the operational structure (see Figure 1).
Figure 1. Areas in which software vendors need to shift their thinking
In the following three sections, we'll take a closer look at each of these shifts, focusing primarily on the application architecture aspect of SaaS.
Changing the Business Model
Changing the business model could involve one or more of the following:
- Shifting the "ownership" of the software from the customer to an external provider.
- Reallocating responsibility for the technology infrastructure and management—that is, hardware and professional services—from the customer to the provider.
- Reducing the cost of providing software services, through specialization and economy of scale.
- Targeting the "long tail" of smaller businesses, by reducing the minimum cost at which software can be sold.
Realizing the benefits of SaaS requires shifts in thinking on the part of both the provider and the customer, and it's up to the provider to help the customer make this shift.
Who "Owns" the Software?
Most software continues to be sold in the same way it has been sold for decades. The customer buys a license to use the software, and installs it on hardware that belongs to the customer or that is otherwise under the customer's control, with the vendor providing support as directed by the terms of the license or a support agreement. In an honest, above-board software transaction, the notion of a "license" can seem like something of a technicality: legally, the customer is only purchasing the right to use a copy of the software, but for practical purposes, it's as though the customer "owns" the software and may use it as often and for as long as it wishes.
With the software-as-a-product model providing the context for the software market, the idea of software as a service can feel somewhat alien: instead of "owning" important software outright, customers are told, they can pay for a subscription to software running on someone else's servers, software that goes away if they stop subscribing. It's therefore especially important that the prospective customer understand how SaaS provides a direct and quantifiable economic benefit over the traditional model.
Transferring IT Responsibilities
In a typical organization, the information technology (IT) budget is spent in three broad areas:
- Software—The actual programs and data that the organization uses for computing and information processing.
- Hardware—The desktop computers, servers, networking components, and mobile devices that provide users with access to the software.
- Professional services—The people and institutions that ensure the continued operation and availability of the system, including technical support staff, consultants, and vendor representatives.
Of these three, it is the software that is most directly involved in information management, which is the ultimate goal of any IT organization. Hardware and professional services, though vital and important components of the IT environment, are properly considered means to an end, in that they make it possible for the software to produce the desired end result of effective information management. (To put it another way, any organization would gladly add software functionality without extra hardware if it could do so effectively, but no organization would simply add hardware without an anticipated need to add software as well.)
In an IT environment based around on-premise software, the majority of the budget is typically spent on hardware and professional services, leaving a minority of the budget available for software (see Figure 2).
Figure 2. Typical buget for an on-premise software environment
In this model, the software budget is spent primarily on licensed copies of "shrink-wrapped" business software and customized line-of-business software. The hardware budget goes toward desktop and mobile computers for end users, servers to host data and applications, and components to network them together. The professional services budget pays for a support staff to deploy and support software and hardware, as well as consultants and development resources to help design and build custom systems.
Note The proportions shown in these diagrams are for illustrative purposes only; they are not intended to advocate any specific allocation of resources, and your allocation may differ significantly.
In an organization relying chiefly on SaaS, the IT budget allocation looks much different (see Figure 3).
Figure 3. Typical budget for an SaaS environment
In this model, the SaaS vendor hosts critical applications and associated data on central servers at the vendor's location, and it supports the hardware and software with a dedicated support staff. This relieves the customer organization from the responsibility for supporting the hosted software, and for purchasing and maintaining server hardware for it. Moreover, applications delivered over the Web or through smart clients place significantly less demand on a desktop computer than traditional locally-installed applications, which enables the customer to extend the desktop technology lifecycle significantly. The end result is that a much larger percentage of the IT budget is available to spend on software, typically in the form of subscription fees to SaaS providers.
Leveraging Economy of Scale
But isn't this result just an illusion? After all, a percentage of the subscription fees paid to SaaS vendors for "software" has to pay for hardware and professional services for the vendor. The answer lies in the economy of scale. A SaaS vendor with x number of customers subscribing to a single, centrally-hosted software service enables the vendor to serve all of its customers in a consolidated environment. For example, a line-of-business SaaS application installed in a load-balanced farm of five servers may be able to support 50 medium-sized customers, meaning that each customer would only be responsible for a tenth of the cost of a server. A similar application installed locally might require each customer to dedicate an entire server to the application—perhaps more than one, if load balancing and high availability are concerns. This represents a substantial potential savings over the traditional model, and for SaaS applications that are built to scale well, the operating cost for each customer will continue to drop as more customers are added. As this is happening, the provider will develop multi-tenancy as a core competency, leading to higher-quality offerings at a lower cost. Therefore, even accounting for the hardware and professional services costs incurred by SaaS vendors, customers can still obtain significantly greater pure software functionality for the same IT budget (see Figure 4).
Figure 4. Typical budget for an SaaS environment (accounting for hardware and professional services costs)
Selling to the Long Tail
With his article "The Long Tail," in the October 2004 issue of Wired (http://www.wired.com/wired/archive/12.10/tail.html), writer Chris Anderson popularized the idea of the "long tail" in explaining why online retailers such as Amazon.com are uniquely positioned to fill a huge demand that traditional retailers cannot serve cost-effectively (see Figure 5).
Figure 5. The "long tail"
Demand for categories of merchandise such as books or compact discs tends to follow what is known as a "power law distribution." In this type of scenario, thousands of books, CDs, and DVDs are published every year, but only a few dozen titles ever rise to the level of bestseller. The rest languish in the so-called long tail: the huge number of smaller releases with specialty appeal that can never hope to sell more than a few thousand copies, perhaps not even that many.
Traditional "brick-and-mortar" retailers concentrate on selling the most popular items, because they can't possibly stock copies of each of the millions of books, CDs, and DVDs in print. Online retailers, however, don't have to worry about limited shelf space; shipping items to customers directly from large warehouses around the world, they can advertise and sell the millionth most popular title as easily as the most popular one. Access to this long tail of low-volume sales translates into a huge amount of revenue.
A large brick-and-mortar bookstore might carry about 130,000 different titles on its shelves. Yet, according to Anderson, the majority of Amazon.com's book sales come fromoutside its top 130,000 titles—in other words, most of the books that Amazon.com sells are titles that wouldn't even be carried by a traditional walk-in bookstore.
Vendors of complex line-of-business (LOB) software solutions face a similar market curve (see Figure 6).
Figure 6. Market curve for LOB software vendors
In contrast to simpler, shrink-wrapped software packages, line-of-business software tends to be custom-tailored to meet individual customers' needs—potentially including on-site installation and service visits from vendor service teams—and often requires dedicated server hardware, and support staff to manage it. The cost of providing this kind of dedicated attention contributes to the minimum price at which the vendor can afford to sell the software. Such software therefore tends to be marketed toward larger businesses that can afford to pay for this level of attention. But for every large enterprise that purchases a line-of-business solution, there are dozens of smaller and medium-sized businesses that could benefit from such a solution, but that cannot afford the expense.
Figure 7. New market opened by lower cost of SaaS
By eliminating much of the upkeep, and using the economics of scale to combine and centralize customers' hardware and services requirements, SaaS vendors can offer solutions at a much lower cost than traditional vendors, not only in monetary terms, but also by greatly reducing the need for customers to add complexity to their IT infrastructure. This gives SaaS exclusive access to an entirely new range of potential customers that have always been inaccessible to traditional solution vendors, because it has never before been cost-effective to serve them (see Figure 7).
Effectively targeting these smaller customers requires another shift in thinking for vendors who are accustomed to a sales process that depends on personal contacts and vendor–customer relationships; most vendors won't be able to provide personal service to a much larger customer base at price points that such a base will support. Selling SaaS is like selling mobile phone ringtones, or downloadable music: it should be possible for a customer to visit your website, subscribe to your service, pay with a credit card, customize the service, and begin using it, all without human intervention on the part of the vendor. This doesn't mean that you have to eliminate the more personal approach for larger customers with more extensive needs. But designing the sales, marketing, provisioning, and customization processes from the ground up to work automatically makes it possible to offer an automated approach as a choice—and has the happy side effect of simplifying the work that your own support personnel must perform in order to accomplish the same tasks on behalf of a customer.
Application Architecture
Our working definition of software as a service is: "Software deployed as a hosted service and accessed over the Internet." Depending on how one defines words such as softwareand access, this definition can encompass a lot of things… perhaps too many. To an application architect, certainly, it doesn't really shed any light on what exactly makes a SaaS application work, the thing that makes the difference between a successful SaaS application and an unsuccessful one. A line-of-business application with a decade-old code base mated to a jury-rigged HTML front end may fit the broad definition of software as a service, but most such applications run into problems when they are unable to scale well or cost-effectively. To define what might be called a mature SaaS application, therefore, we must introduce some additional criteria.
The Three Attributes of a Single-Instance Multi-Tenant Architecture
From an application architect's point of view, there are three key differentiators that separate a well-designed SaaS application from a poorly designed one. A well-designed SaaS application is scalable, multi-tenant-efficient, and configurable.
Scaling the application means maximizing concurrency, and using application resources more efficiently—for example, optimizing locking duration, statelessness, sharing pooled resources such as threads and network connections, caching reference data, and partitioning large databases.
Multi-tenancy may be the most significant paradigm shift that an architect accustomed to designing isolated, single-tenant applications has to make. For example, when a user at one company accesses customer information by using a CRM application service, the application instance that the user connects to may be accommodating users from dozens, or even hundreds, of other companies—all completely unbeknownst to any of the users. This requires an architecture that maximizes the sharing of resources across tenants, but that is still able to differentiate data belonging to different customers.
Of course, if a single application instance on a single server has to accommodate users from several different companies at once, you can't simply write custom code to customize the end-user experience—anything you do to customize the application for one customer will change the application for other customers as well. Instead of customizing the application in the traditional sense, then, each customer uses metadata to configure the way the application appears and behaves for its users. The challenge for the SaaS architect is to ensure that the task of configuring applications is simple and easy for the customers, without incurring extra development or operation costs for each configuration.
The Software as a Service Maturity Model
We've enhanced our working definition of SaaS by identifying the important attributes of a mature SaaS application. But maturity isn't an all-or-nothing proposition. An application can possess just one or two of these attributes and still meet all necessary business requirements, in which case the application architects may actively choose not to fulfill the other attributes, if doing so would not be cost-effective.
Broadly speaking, SaaS application maturity can be expressed using a model with four distinct levels. Each level is distinguished from the previous one by the addition of one of the three attributes listed above.
Figure 8. Four-level Saas maturity model
Level I: Ad Hoc/Custom
The first level of maturity is similar to the traditional application service provider (ASP) model of software delivery, dating back to the 1990s. At this level, each customer has its own customized version of the hosted application, and runs its own instance of the application on the host's servers. Architecturally, software at this maturity level is very similar to traditionally-sold line-of-business software, in that different clients within an organization connect to a single instance running on the server, but that instance is wholly independent of any other instances or processes that the host is running on behalf of its other customers.
Typically, traditional client–server applications can be moved to a SaaS model at the first level of maturity, with relatively little development effort, and without re-architecting the entire system from the ground up. Although this level offers few of the benefits of a fully mature SaaS solution, it does allow vendors to reduce costs by consolidating server hardware and administration.
Level II: Configurable
At the second level of maturity, the vendor hosts a separate instance of the application for each customer (or tenant). Whereas in the first level each instance is individually customized for the tenant, at this level, all instances use the same code implementation, and the vendor meets customers' needs by providing detailed configuration options that allow the customer to change how the application looks and behaves to its users. Despite being identical to one another at the code level, each instance remains wholly isolated from all the others.
Moving to a single code base for all of a vendor's customers greatly reduces a SaaS application's service requirements, because any changes made to the code base can be easily provided to all of the vendor's customers at once, thereby eliminating the need to upgrade or slipstream individual customized instances. However, repositioning a traditional application as SaaS at the second maturity level can require significantly more re-architecting than at the first level, if the application has been designed for individual customization rather than configuration metadata.
Similarly to the first maturity level, the second level requires that the vendor provide sufficient hardware and storage to support a potentially large number of application instances running concurrently.
Level III: Configurable, Multi-Tenant-Efficient
At the third level of maturity, the vendor runs a single instance that serves every customer, with configurable metadata providing a unique user experience and feature set for each one. Authorization and security policies ensure that each customer's data is kept separate from that of other customers; and, from the end user's perspective, there is no indication that the application instance is being shared among multiple tenants.
This approach eliminates the need to provide server space for as many instances as the vendor has customers, allowing for much more efficient use of computing resources than the second level, which translates directly to lower costs. A significant disadvantage of this approach is that the scalability of the application is limited. Unless partitioning is used to manage database performance, the application can be scaled only by moving it to a more powerful server (scaling up), until diminishing returns make it impossible to add more power cost-effectively.
Level IV: Scalable, Configurable, Multi-Tenant-Efficient
At the fourth and final level of maturity, the vendor hosts multiple customers on a load-balanced farm of identical instances, with each customer's data kept separate, and with configurable metadata providing a unique user experience and feature set for each customer. A SaaS system is scalable to an arbitrarily large number of customers, because the number of servers and instances on the back end can be increased or decreased as necessary to match demand, without requiring additional re-architecting of the application, and changes or fixes can be rolled out to thousands of tenants as easily as a single tenant.
Choosing a Maturity Level
What maturity level should you target for your application? One might expect the fourth level to be the ultimate goal for any SaaS application, but this isn't always the case. It may be more helpful to think of SaaS maturity as a continuum between isolated data and code on one end, and shared data and code on the other (see Figure 9).
Figure 9. SaaS maturity as a continuum
Where your application should fall along this continuum depends on your business, architectural, and operational needs, and on customer considerations. As you'll be able to see even from this simple explanation, all of these considerations are interrelated to some degree.
- Business model—Does an isolated approach make financial sense? Forsaking the economic and management benefits of a shared approach means offering your application to the consumer at a higher cost; however, under some circumstances, it may be worth it to meet other needs. In addition, customers may have strong legal or cultural resistance to an architectural model in which multiple tenants share access to an application, even if you can demonstrate that it does not place confidential data at risk. Ultimately, of course, you'll need a business model that shows how your application can make money at whichever maturity level you've targeted.
- Architectural model—Can your application be made to run in a single logical instance? If you are seeking to move a desktop-based or traditional client–server application to an Internet-based delivery system, it may be fundamentally incompatible with a single-instance, metadata-centric approach, and you may determine that it will never make financial sense to invest the development effort necessary to transform it into a fully mature SaaS application. If you are designing and building a net-native application from the ground up, you will probably have a lot more freedom to take a single-instance approach.
- Operational model—Can you guarantee your service level agreements (SLAs) without isolation? Carefully examine the obligations imposed by any existing SLAs that you have with customers, with regard to considerations such as downtime, support options, and disaster recovery, and determine whether these obligations can be met under an application architecture in which multiple unrelated customers share access to a single application instance.
High-Level Architecture
Architecturally, SaaS applications are largely similar to other applications built using service-oriented design principles (see Figure 10).
Figure 10. SaaS application architecture
Most of the components depicted in Figure 10 should be familiar to most application architects. The process services expose interfaces that smart clients and/or the Web presentation tier can invoke, and kick off a synchronous workflow or long-running transaction that will invoke other business services, which interact with the respective data stores in order to read and write business data. Security services are responsible for controlling access to end-user and back-end software services.
The most significant difference is the addition of metadata services, which are responsible for managing application configuration for individual tenants. Services and smart clients interact with the metadata services in order to retrieve information that describes configurations and extensions that are specific to each tenant.
Metadata Services
In a mature SaaS application, the metadata service provides customers with the primary means of customizing and configuring the application to meet their needs. Typically, customers can make configuration changes in four broad areas:
- User interface and branding—Customers often appreciate the ability to modify the user interface to reflect their corporate branding, and therefore SaaS applications typically offer features that allow customers to change things such as graphics, colors, fonts, and so on.
- Workflow and business rules—To be of use to a wide range of potential customers, a business-critical SaaS application has to be able to accommodate differences in workflow. For example, one customer of an invoice tracking application may require each invoice to be approved by a manager; a second customer may require each invoice to be approved by two managers in sequence; a third may require two managers to approve each invoice, but allow them to work in parallel. When appropriate, customers should be able to configure the way in which the application's workflow aligns with their business processes.
- Extensions to the data model—For many data-driven SaaS applications, one size definitely doesn't fit all. Even with relatively simple, task-specific applications, customers may chafe under the restrictions imposed by a static, unchanging set of data fields and tables. An extensible data model gives customers the freedom to make an application work their way, instead of forcing them to work its way. Later in this paper, you'll learn a bit more about how a customer-extensible data model is architected.
- Access control—Typically, each customer is responsible for creating individual accounts for end users, and for determining which resources and functions each user should be allowed to access. Access rights and restrictions for each user are tracked by using security policies, which should be configurable by each tenant.
To provide customers with flexibility in configuring the software as necessary, these options are organized into hierarchical configuration units known as scopes, each of which contains options for making changes in each of the four areas listed above. Every customer has a top-level scope that it can configure as needed, and the customer may establish one or more scopes underneath the top level in an arbitrary hierarchy. A relationship strategy determines how and whether child nodes inherit and override configuration settings from parent nodes.
For example, a typical customer that purchases enterprise-wide access to your application may have several business units with distinct needs, all of which must follow certain company-wide standards, but also must be able to configure some aspects of the application individually. Within each business unit as well, there may be organizational groups that have their own special configuration needs. For each of these identified organizational units, the customer can establish a scope that gives the group access to the configuration options that it may set or change.
Unlike traditional vendor-customized line-of-business applications, SaaS applications are much more likely to be configured by customers themselves. Designing the configuration interface is therefore almost as important as designing the interface for end users. Ideally, customers should be able to configure the application through a wizard, or through simple, intuitive screens that present all available options without causing information overload, and that clearly distinguish between options that can and cannot be changed within a given scope.
Security Services
As important as security is in any software context, the nature of SaaS makes security both a paramount concern for customers, and a high priority for application architects. Following some basic guidelines can help ensure that tenants remain in control of their private data.
Authentication
The SaaS provider typically delegates to each tenant the responsibility for creating and maintaining its own user accounts, a process known as delegated administration. Delegated administration creates a situation in which the customer is responsible for creating individual user accounts, but the vendor has to authenticate them. To accommodate this delegated-administration model, SaaS designers use two general approaches for handling authentication: a centralized authentication system, or a decentralized authentication system. The approach that you choose will have ramifications for the complexity of your architecture and the way end users experience the application, and you should consider what your business model says about the needs of the application, customers, and end users when making a decision.
In a centralized authentication system, the provider manages a central user account database that serves all of the application's tenants. Each tenant's administrator is granted permission to create, manage, and delete user accounts for that tenant in the user account directory. A user signing on to the application provides his or her credentials to the application, which authenticates the credentials against the central directory and grants the user access if the credentials are valid (see Figure 11).
Figure 11. Centralized authentication system
This approach requires a relatively simple authentication infrastructure that is comparatively easy to design and implement, and that does not require any changes to the tenant's own user infrastructure. An important disadvantage to this approach is that a centralized authentication system makes it much more difficult to implement single sign-on, in which the application accepts the credentials that the user has already entered to gain access to his or her corporate network. Without single sign-on, users are frequently presented with an inconvenient login prompt when logging in to the application, and they must enter their credentials manually.
In a decentralized authentication system, the tenant deploys a federation service that interfaces with the tenant's own user directory service. When an end user attempts to access the application, the federation service authenticates the user locally and issues a security token, which the SaaS provider's authentication system accepts and allows the user to access the application (see Figure 12).
Figure 12. Decentralized authentication system
This is an ideal approach when single sign-on is important, because authentication is handled behind the scenes, and it doesn't require the user to remember and enter a special set of credentials. The decentralized approach is more complex than the centralized approach, however, and a SaaS application with thousands of customers will require individual trust relationships with each of the thousands of tenant federation services.
In many cases, the SaaS provider may want to consider a hybrid approach—using the centralized approach to authenticate and manage users of smaller tenants, and the federated approach for larger enterprises that demand, and will pay for, the single sign-on experience.
Authorization
Typically, access to resources and business functions in a SaaS application is managed by using roles that map to specific job functions within an organization. Each role is given one or more permissions that enable users assigned to the role to perform actions in accordance with any relevant business rules (see Figure 13).
Figure 13. Access control
Roles are managed within the SaaS application itself; they can contain individual user accounts, as well as user groups. Individual user accounts and groups can be assigned several different roles as required.
Depending on the roles to which a user is assigned, he or she is granted one or more permissions to perform specific operations or actions. These actions typically map directly to important business functions, or to the management of the application itself. For example, a purchasing application might include permissions for creating, submitting, approving, and rejecting purchase orders; an application for mortgage brokers might include permissions for checking a borrower's credit and granting a loan; and so forth. A single permission can be assigned to one or several roles, as necessary; each user will be granted the union of the permissions assigned to all roles to which the user belongs.
Applications can use business rules to control access to actions and resources at a finer level than permissions allow. Business rules introduce conditions that must be satisfied before access is granted. For example, you can use a business rule that allows a user to transfer funds between different accounts only during normal business hours, or if the amount being transferred does not exceed a certain figure.
Access control is managed at the scope level. Each scope inherits roles, permissions, and business rules from any parent scopes, according to the application's relationship strategy, and it can modify, add, and delete them as appropriate. For example, consider a customer based in the United States, with a branch office in Toronto, Canada. The root scope has a role named Benefits Administrator that has a number of permissions related to managing employee benefits, including the administration of the company's 401(k) retirement savings plan. Because 401(k) plans are a creation of U.S. tax law, they are not used in Canada. Therefore, a child scope is created for the Canadian office that inherits the Benefits Administrator role and its permissions, with the exception of the permission that allows the role to modify 401(k) offerings. In place of this permission, the customer adds a permission that allows the role to modify Registered Retirement Savings Plan (RRSP) offerings, the Canadian equivalent of the U.S. 401(k).
Figure 14. Example of root scope permissions vs. child scope permissions
As a best practice, your application should include a default set of roles, permissions, and business rules that are available to all tenants, and it should allow individual tenants to customize these rules and create more rules through a useful and intuitive user interface.
A Closer Look: Multi-Tenant Data Model
Thus far, we've been covering application architecture at a fairly high level, so let's examine a particular challenge in greater detail: that of creating a data model that customers can extend in a multi-tenant environment. This is by no means a comprehensive exploration of the data model extension process, but it should help give you an idea of the kinds of architectural issues that must be considered when designing SaaS applications.
As designed, your application will naturally include a standard database setup, with default tables, fields, queries, and relationships that are appropriate to the nature of your solution. But different organizations have their own unique needs, which a rigid, inextensible default data model won't be able to address. For example, one customer of a SaaS job-tracking system might have to store an externally-generated classification code string with each record in order to fully integrate the system with their other processes. A different customer may have no need for a classification string field, but might require support for tracking a category ID number, an integer. Therefore, in all but a few specialized cases, you will have to develop and implement a method by which customers can extend your default data model to meet their needs, without affecting the data model as used by other customers. We'll look at three general approaches to solving this problem: a dedicated tenant database, a shared database with a fixed extension set, and a shared database with custom extensions.
Dedicated Tenant Database
The first approach involves simply giving each tenant its own database, which the tenant can extend as necessary.
With this approach, a new standard default database is created for a new tenant as part of the provisioning process, and the metadata service keeps track of which database is assigned to which tenant. Once the new database is created, the tenant is free to modify it as extensively as your application's user interface and program logic allows, potentially creating new fields, new queries, and even new tables and relationships.
If the cost of providing services is not a factor, this would be the only approach to consider, because it is the simplest arrangement to build, and it offers customers the maximum freedom to extend your default data model. Moreover, customers in fields such as banking or medical records management may have very strong data isolation requirements, and may not even consider an application that does not supply each customer with its own individual database. The disadvantage of this approach is that you will be able to support only a limited number of databases for each server, and therefore your infrastructure cost will be higher, and it will rise more quickly than it would otherwise.
Shared Database, Fixed Extension Set
The second approach involves building a single database that is shared by all of your tenants, and that includes a preset number of custom fields that tenants can assign and use as desired (see Figure 15).
Figure 15. Custom fields in a shared database
In Figure 15, records from different customers are intermingled in a single table; a TenantID field associates each record with an individual tenant. In addition to the standard set of fields, a number of custom fields are provided, and each customer can choose what to use these fields for, and how data will be collected for them.
Custom fields can by typed, so that the customer can use any available built-in type checking and verification functions that the application and database provide in order to validate the data. Alternatively, the fields can be untyped, so that the customer can use them to store any type of data. (The customer can optionally provide its own validation logic, to prevent users from accidentally entering invalid data).
A shared database carries a much lower cost of providing services than the isolated approach does, because it allows a single database engine to support a larger number of customers before partitioning becomes necessary. The biggest disadvantage to this approach is that the extensibility of the data model is limited to the number of custom fields you provide. Choosing this number wisely requires carefully assessing your customers' potential needs. If there are too few custom fields, your customers will not be able to use your application effectively; if there are too many, the result is a sparse, wasteful database with many unused fields.
Shared Database, Custom Extensions
The third approach involves building a single, shared database, and allowing customers to extend the data model arbitrarily, storing custom data as name–value pairs in a separate table (see Figure 16).
Figure 16. Custom data stored in a separate extension table
Here, each customer record that includes custom data is assigned a unique record ID, which matches one or more rows in a separate extension table. For each row in this table, a name–value pair is stored. Each customer can create as many of these name–value pairs as necessary to meet their business needs. When the application retrieves a customer record, it performs a lookup in the custom data table, selects all rows corresponding to the record ID, and returns them to be treated as ordinary field data. Obviously, data in the custom data table cannot be typed, because it is likely to contain data in many different forms for different customers. To work around this limitation, a third column can optionally hold a data type identifier, so that the data can be cast to the appropriate data type once it is retrieved.
This approach makes the data model arbitrarily extensible, while retaining the cost benefits of using a shared database. The main disadvantage is an added level of complexity for database functions, such as searching, indexing, querying, and updating records. This is typically the best approach to take if you anticipate that your customers will require a considerable degree of flexibility in extending the default data model, but that they won't require data isolation.
When developing an extensibility approach for your data model, remember that any extension implemented by a customer will require a corresponding extension to the business logic (so that the application can use the custom data), as well as an extension to the presentation logic (so that users have a way to enter the custom data as input and receive it as output). The configuration interface that you present to the customer should therefore provide mechanisms for updating all three, preferably in an integrated fashion. (Providing mechanisms by which customers may extend the business logic and user interface will be addressed in a future paper.)
Scalability
Large-scale enterprise software is intended to be used by thousands of people simultaneously. If you have experience building enterprise applications of this sort, you've gotten to know first-hand the challenges of creating a scalable architecture. For a SaaS application, scalability is even more important: you'll have to support the average user base of a single customer, multiplied by the total number of customers that you have. For ISVs accustomed to building on-premise enterprise software, supporting this kind of user base is like moving from the minor leagues to the majors: the rules may be familiar, but the game is played on an entirely different level. Instead of a widely deployed, business-critical enterprise application, you're really building an Internet-scale system that needs to actively support a user base potentially numbering in the millions.
Scaling the Application
Of course, it's very unlikely that you'll end up supporting as many users as Hotmail does (though if you do, congratulations!). But the scalability challenges are actually quite similar.
Applications can be scaled up (by moving the application to a larger, more powerful server) and scaled out (by running the application on more servers). Scaling up, a familiar solution to anyone who's ever replaced an aging computer with a brand-new model, is often the better choice for smaller applications that don't have to serve very many concurrent users. At the SaaS level, though, scaling out is almost always the best way to add capacity, as depicted in the SaaS maturity model. A well-designed SaaS application can be scaled out to an arbitrarily large number of servers, each running one or more identical instances of the application. The following are some guidelines for designing an application for "scale out":
- Design the application to run in a stateless fashion, with any necessary user and session data stored either on the client side, or in a distributed store that's accessible to any application instance. Statelessness means that each transaction can be handled by one instance as well as any other; a user may transact with dozens of different instances during a single session, without ever knowing it.
- Design the application to conduct I/O operations asynchronously, so that the application can perform useful work while waiting for input and output to complete.
- Pool resources such as threads, network connections, and database connections; this helps maximize your computing resources, and it improves your ability to predict resource usage.
- Write your database operations in such a way as to maximize concurrency and minimize exclusive locking. For example, don't lock records when performing read-only operations.
Of course, this is only the very briefest of examinations of the topic; volumes could be (and have been) written about implementing a scalable architecture. For some additional guidance, see the Performance & Scalability resources published by Microsoft Patterns & Practices.
Scaling the Data
As databases serve more users concurrently and grow in size, the amount of time it takes to perform operations such as querying and searching increases significantly. SaaS applications, which often use the same databases to serve thousands of customers, are particularly susceptible to these types of performance degradation, and therefore it's important to plan adequately for growth.
One fairly simple way to scale a database is through partitioning, dividing the data into smaller "chunks" in order to improve the efficiency of queries and updates. Consider developing a partitioning strategy to determine the best way to partition your data. For example, if an application has customers from around the world, a geographic partitioning strategy might be appropriate, with data belonging to European customers in one partition, data belonging to Asian customers in another, and so on.
In most situations, it is likely that database size will keep growing. Therefore, it is also important to have dynamic repartitioning strategies in place, to ensure that already-partitioned data can be repartitioned in order to keep up with performance and scale metrics.
Operational Structure
The third important shift in thinking has to do with the operational structure of the application: what it takes to deliver the application to customers, and to keep it available and running well at a cost-effective level. For many ISVs, which have never had to run a data center for their customers, this may be the most unfamiliar aspect of SaaS. SaaS providers not only have to be experts in building software and bringing it to market, they must also become experts in operating and managing it.
Resources such as the Microsoft Operations Framework (MOF) provide a great deal of relevant guidance for maintaining system reliability, availability, supportability, and manageability. In addition to the common operation issues that MOF is designed to address, SaaS presents some unique challenges of its own.
Shared Services
If you've had experience with an enterprise-level World Wide Web presence, you're already familiar with the fundamentals of Web hosting and middleware services, in which an organization either hosts a site internally, or contracts with an external provider for equipment co-location or full-service hosting, including hardware, storage, and network bandwidth. The hosting service is responsible for the availability of the site, but it is typically not otherwise responsible for the site's operation and maintenance.
Providing software as a service adds an additional layer to consider when making hosting arrangements (see Figure 17). Depending on your business plan, you may need a metering and billing system in order to do the following:
- Accurately track customers' usage, and bill them for time or resources used.
- Restrict or throttle access at certain times of the day, or in order to meet other criteria.
- Monitor site access and performance, to ensure that SLAs are being met.
- Perform other functions in order to ensure a seamless experience for your customers that meets or exceeds expectations.
Collectively, the systems used to perform these functions are known as shared services.
Figure 17. Shared services layer for SaaS hosting
Shared services can be further classified into two subcategories:
- Operational support services (OSS)—Handle operational issues such as account activation, provisioning, service assurance, usage, and metering.
- Business support services (BSS)—Support billing (including invoicing, rating, taxation, and collections) and customer management (which includes order entry, customer self services, customer care, trouble ticketing, and customer relationship management).
As with traditional Web hosting, you will need to decide whether to build the shared services layer yourself and self-host your application, or to contract with an external hosting company (known as a SaaS provider) to provide it. SaaS providers offer a set of shared services to handle the business and operational issues identified above.
Monitoring
The SLAs that you enter into with your customers will quantify the operational standards that you are required to meet. SLAs are legally binding contracts, and failing to meet them can mean significant lost revenue and damage to your reputation. Monitoring your application architecture for any sign of trouble is therefore a vital tool for detecting problems, and fixing them before they result in significant outages or performance degradation.
Monitoring for Availability
Assuring high availability should be one of the most important priorities for any SaaS vendor. An outage that affects a single server or data center could lead to significant data or productivity losses for a large percentage of your customers—and maybe your entire customer base! For ISVs moving to SaaS from a background in traditional desktop or client–server software development, the high-availability requirements of a net-centric application model can involve new and unfamiliar challenges. It is recommended that you build support for basic techniques, such as heartbeat monitoring and alert mechanisms, into your application, and that you pay special attention to potential weak links, such as a connection to a database at a remote site not under your control.
Of course, technical mechanisms such as alerts are only a part of the process of ensuring high availability—and if an alert goes off, but nobody responds, it can't really be said to be part of the process at all. Ensure that there are processes in place at your operations center that prescribe specific courses of action, and standards to achieve, in the event of a system failure.
For an overview of the issues surrounding high availability, see "Service Management Functions: Availability" on Microsoft TechNet.
Monitoring for Performance
Your customers expect you to provide them with application access at an acceptable level of performance. To some extent, this expectation will be made explicit by the SLAs that you agree to honor as part of your contract with the customer. Beyond SLAs, however, if customers perceive your application to be slow or unresponsive, they will be more likely to terminate or decline to renew their subscriptions; disgruntled users may make their feelings known on websites and in the pages of industry publications, thus giving your application a negative reputation. Conversely, a fast, lean application that meets users' needs will please customers, and—if they've moved to your software from a less responsive traditional software package—even make them more receptive to SaaS as a category.
To ensure a high level of performance, build support for performance counters into your application directly, if at all possible. Set performance thresholds for metrics such as CPU usage and application response times, and use alerts to notify the appropriate personnel when management events are raised.
Establishing a baseline for performance is generally the most critical activity. With an established baseline, it is much easier to tell when something abnormal is happening, and where the problem is.
Conclusion
There's plenty more to be said about each of the topics addressed in this paper, but hopefully, by this point, you've read enough to begin developing a conceptual framework for understanding SaaS, and how you and your customers may benefit from it. SaaS represents a new paradigm in software delivery, an architectural model built on the principles of multi-tenant efficiency, massive scalability, and metadata-driven configurability to deliver good software inexpensively to existing and potential customers. Adopting these principles now can help put you well on the path to transforming the way you capture the long tail business.
For more information, please see Multi-Tenant Data Architecture
Acknowledgements
Many thanks to Paul Henry for his help with technical writing.
Feedback
The authors gladly welcome your feedback about this paper. Please e-mail all feedback to fredch@microsoft.com or gianpc@microsoft.com . Thank you.
Langganan:
Postingan (Atom)