Updated: Sep 13
At Triseed, we always use the latest and greatest technology to help auto-scale, modernize, auto-heal and build a reusable platform. We built a solution for a US-based company and move from monolith on-prem system to a Kubernetes cloud environment. Because of our expertise, we were able to follow through with customer expectations, budget, and timeline. This is our approach for the lift and shift and modernization of the entire platform.
Istio is an open-source service mesh tool that runs in Kubernetes. This tool allows us to define the traffic, allowing services running inside Kubernetes to be more distinct and effective. The customer requires a blue-green deployment in their container system and we achieved this by using istio and GitHub actions pipeline.
GitHub actions are our cicd solution. The customer doesn’t want to use a deployment solution like argocd or flux since it’s another layer of software to maintain. They want to make this as simple as possible and hyper-focused on the code.
The customer is also using AWS and decided to utilize EKS so it’s maintained by the cloud provider.
Here are the steps on how we help the company move to Kubernetes
1. Identify which service platform will run on Kubernetes and what will run on the cloud provider. This is the audit period where both sides collaborate on almost a daily basis.
2. Setup a plan for failover steps and procedure.
3. Explain to the customers the findings and get feedback.
4. Create a golden docker base image that developers can use. The customer likes the docker image to be updated every day so we have a GitHub action that is scheduled to run every night to run the docker build command with `apt update` and `apt upgrade`.
5. Customer doesn’t like to use Cloudformation because it’s not a persistent state. We picked Terraform because of its robust declarative coding and modularity. This makes our terraform build-out reusable and easy to understand. We built modules that will set up vpc, security groups, AWS RDS, EMR, MSK, s3, sqs, sns, ses, lambda, API gateway, elastic search, redshift, route53, and EKS. Terraform is also on GitHub actions so we were able to promote code and infrastructure on multiple environments.
6. We then updated the GitHub action deployment of the customer applications and add a new step to deploy to the new AWS infrastructure.
7. Because it’s a large database, we tried setting up a database replica in the cloud. Our biggest pain point is network latency. A business running on the east coast just has too much latency to us-east-2 (OHIO). On top of that, the customer does not want to pay for a dedicated line to aws so we decided to set up an ETL to sync the initial data over to AWS.
8. We plan our steps and cutover.
9. We did a test run one weekend and learn from our mistakes. Think of it like we are doing a disaster recovery.
10. We documented our mistakes and things we need to improve then set 2 sprints(4 weeks) to fix them all. Our goal is to make this failover as automated as possible so we wrote a lot of bash and python scripts to simplify the process.
11. We did another dry test failover and got things to work!
12. There are still things that need to be improved. We realize that we cannot use etl for some databases so we decided to increase our failover downtime and just shutdown of the monolith app during failover to AWS, create data rest backup for Elastic, Postgres, Ms SQL, and Kafka then have the team run a script to restore on the AWS resources that are already built using terraform. All data rest are about 6 TB of data.
13. We did another planning process for go-live so we decided to do all of this from Friday evening to Sunday morning.
14. We were able to test everything and QA, management, and customer success gave the green light for a successful failover
a successful failover will not work without skilled engineers with the support of management.
Some AWS tools like database migration service might not work all the time because of how customers use the platform. Sometimes it’s better to let people do what they are used to because they tested it and have confidence things will work their way. Even if you use an AWS tool, you will realize the tool themselves can be buggy. We learned this when we used the AWS app migration service. The servers lost the application license because the hardware spec changed.
If the customer is using a third-party vendor for their software, engage them at the early stage of planning because there are things that might not work during migration and you will need their help.
Customers emphasize that triseed experienced engineers truly helped the business decisions because we have a pool of developers, platform engineers, DevOps, infrastructure, and even network engineers on staff.
Lastly, customers expected and budgeted this to be done in 3 months. Triseed engineers expected this to be done in 14 months but because we worked with customers, established partnerships with internal developers and outside vendors at early stages, and planned things correctly, we were able to wrap up the project in 8 months.