Ref: 20091117
Last edited 29th September 2011
IntroductionThe purpose of this article is to explain the deployment options for the Triaster Server, and the most important factors to take into account when planning a deployment.
From 2010 onwards, all Triaster server products - previously referred to as The Triaster Publication Server (for Sharing processes), The Triaster Browser Toolkit Server (for Using processes) and The Triaster Improvement Workbench Server (for Improving processes) - are all combined into a single Triaster Server. The minimum system requirements for a Triaster Server are available in the following article: What are the Minimum System Requirements? In the smaller of deployments, a single Triaster Server operating on a single CPU is sufficient. As demand on the physical server hardware increases however it is beneficial to "spread the load" or to "scale" the Triaster solution so that more people can use the solution without loss of performance.
Furthermore, to ensure high levels of up-time and resilience, it is helpful to plan in advance for general classes of system failures and to provide a failover mechanism for when things go wrong, and test environments to ensure the live environment is never contaminated with maps, software or OS changes that cause failure.
Planning FactorsThese are:
Factor 1 is concerned with ensuring the live environment that is accessed by the end users is always available, and that safeguards are in place to detect any problems well in advance of any issues spreading to the live environment.
The 2nd factor is with the end-user experience. The end-user is the person who accesses the process library to find information to help them perform their tasks. Generally, the end-user experience is highly dependent on the following:
Most end-users will be dissatisfied if the time taken to wait for a process map to load, or a search to return a result, is noticeable. We are dealing with fractions of a second ideally, a couple of seconds from time to time, and maybe very rarely up to 10 seconds.
The planning strategy then is to ensure that the end-user experience is comfortably within the page load times described above.
Regarding Security, organisations frequently require to separate their web content from their back-end systems. This is a complex topic, very much dependent on individual customer circumstances, and is not covered in this article.
ResilienceThis section is concerned with planning for and enabling Resilience, i.e. high levels of reliability, up-time and continuity of service. The Triaster Server is a single point of failure, so planning to avoid failures, and to enable rapid recovery when they occur, is important.
The typical non-resilient deployment strategy is shown below:
Process Authors on the left create and modify processes which are then stored on a File Server. The Triaster Server then publishes and serves these maps to the end-user population via HTTP.
There are generally 2 classes of problems that can cause the Triaster solution not to work as intended:
So, it is advised that safeguards be implemented to avoid any of these types of issues getting to the Triaster Server that is responsible for delivering content to the end-users.
Resilience Step 1 - Organise so that the Triaster Server is Easily RecoverableWe recommend:
No user data is stored on the Triaster Server, so that over time the only changes that take place on the server are those associated with software updates and the output of a publish.
Resilience Step 2 - Implement a Test ServerThere will be a huge class of problems that are avoided if a Test Server is in-place to act as the first safeguard to preventing issues being introduced into the live system. All Triaster customers with a Trusted Partner Agreement are entitled to use a Test Server as a free benefit of that agreement.
A Test Server can accomplish all of the following:
A Test Server need not replicate the live server to the nth degree, it merely needs to trap the large class of potential problems that can be spotted by publishing, upgrading or applying OS updates.
A Test Server can also be a virtual machine.
The diagram below shows how a Test Server can be brought into the server deployment strategy.
Two (completely independent) servers are used. The Test Server runs alongside the Live Server, and any configuration changes or software updates are applied to it first. Publishes can also take place on the Test Server before performing publishes on the Live Server.
Note that the Test Server should not be confused with test sites or test libraries. For example, there may be 6 libraries on the Live Server, each with a Sandpit, Pre-live and Live site within them. These should all be replicated on the Test Server. The purpose of the Sandpit, Pre-Live and Live sites is to manage and approve content, not to test configuration changes or software updates.
Resilience Step 3 - Implement a Failover ServerThe purpose of the Failover Server is to provide continuity of service if, for whatever reason (though generally hardware), the Live Server fails.
Since virtualising the Triaster Server and backing it up removes the need for a failover, we no longer recommend a dedicated Failover Server be created unless there are unusually strong availability requirements.
PerformanceScaling is the process of maintaining satisfactory levels of performance whilst at the same time increasing the end-user population and the total amount of content in the libraries.
The recommendation regarding scaling is to first implement Resilience as described above. The scaling steps are then simply the introduction of a cluster of Live Triaster Servers that each take specialist roles (role partitioning), service a sub-set of all the content (content partitioning), or share the processing of long tasks between them (load balancing).
Our recommended start-point for all customers is to implement a resilient two server cluster with role partitioning as described below.
Scaling Step 1 - A Resilient Two Server Cluster with Role PartitioningPublishing maps to HTML and cloning libraries or sites is a CPU intensive task. When a publish is triggered or a site is cloned, the Triaster Server CPU can be used very heavily for several hours at a time and disk activity is intensive. If end-users are trying to access content in a library while a publish or a library clone is taking place, then page load times will increase. The first step in scaling therefore is to create a 2 server cluster with role partitioning, and dedicate one server to the role of publishing the source maps to HTML and one to the role of serving HTML to the end users.
This is the standard recommended implementation approach that Triaster perform for all but the smallest of organisations (up to low hundreds of desktops).
By isolating and dedicating one Triaster Server to the role of publishing the maps (and cloning libraries) and and one to the role serving the HTML, publishing and cloning operations will not impact on page load times or the end-user experience.
Furthermore, the HTML server is the only Triaster server that faces end-users and which is accessed through http. By isolating this server, into a DMZ for example, the back end processes associated with cloning and publishing can still continue.
Scaling Step 2 - A Resilient Three Server Cluster with Role Partitioning and Content PartitioningEventually, as the number of libraries grows, the publishing tmes will increase to such a degree that the delay between requesting a publish and the site being published becomes too long. The 2nd scaling step therefore is to add an additional Triaster Server to create a resilient three server cluster as shown below. Two of the servers are dedicated to publishing the process maps (and cloning libraries), and a single server is devoted to serving the resultant HTML. In 10.1 of the Triaster Solution, content partitioning should be used so that each Triaster Server performing a publish role is dedicated to publishing a specific set of libraries (as opposed to load balancing wherein both servers simultaneously publish parts of the same library at the same time). In 11.1 of the Triaster Solution and later, load balancing as well as content partitioning is available.
The important point to note is that a Triaster Server can perform any role (test, failover, publish, HTML). So the specific needs of the project determine precisely how a 3-server cluster would be configured.
For example, suppose a University has process libraries which are staff facing only, and others which are student facing. A natural partition strategy may be to dedicate two Triaster Servers to the HTML role, one for staff and one for students, both serviced by a single Triaster Server dedicated to publishing and cloning.
As a second example of two Triaster Servers in the HTML role, suppose a large organisation has many process libraries on a single Triaster Server, and it is now about to process map a highly secure process, secure even to internal staff. In this case, the highly secure process could be isolated onto its own Triaster Server and subject to different lock-down and security measures.
However, for the bulk of existing Triaster customers, additional additional publication capacity to the server cluster is the most natural scaling step.
Scaling Step 3 - An n-Server ClusterThe 3rd scaling step is to add more servers into the cluster, each of which takes a specialist role, or services the needs of a specific set (or partition) of libraries.
Suppose a global organisation requires process libraries to be delivered into several different countries. Rather than end-users in each country have to load the libraries from a Triaster Server in a different continent, the end-users' experience will be much better if they can load from a LAN. In this example, a Triaster Server would be installed on a server in each country.
Scaling Step 4 - Network Load Balanced ClustersSeveral HTML Servers can form a Network Load Balanced Cluster (NLBC). However, this would cover the case that a specific library, that has already been isolated out on a dedicated HTML Server as part of a partitioning strategy, is under such heavy usage that the end-user experience is still poor.
For example, suppose there are 10,000 users of a HR library, and each year during the appraisal month the library comes under very heavy usage and the server is not sufficiently powerful to maintain page load times at a reasonable level. By clustering 2 or more HTML Servers into an NLBC, the load is split across the servers (essentially, each request to the server is allocated to the next available server) and in this way page load times can be improved.
Usage Profiles and Server NumbersIt is impossible to be too precise regarding the number of servers required to meet the needs of different user populations. 1 person accessing the library once a day is exactly the same server demand in an organisation of 10 as it is in an organisation of 10,000.
To help with planning decisions consider the following KPIs in respect of the performance standards you expect from the Triaster solution. What are the Typical Expected and Worst Case values you require for your project? If you are finding that performance is low in a particular area, then add server capacity as appropriate to address the specific bottleneck.
Triaster work on the following rules of thumb based on the total number of people that will use the content in the library anywhere from once a day to once a year. This is based on how existing customers use the solution and the publish and page load times they are happy with as at June 2011. We will update this table regularly as we learn more from each implementation.
Conclusion The resilience and scaling steps outlined above provide a way for any organisation to scale the Triaster solution up to hundreds of thousands of end users, and in many different countries. A combination of partitioning and clustering provides a straightforward way to respond to increased end-user demand for the content in process libraries.
|
IT Administrators > Installation >