By Mandira Shah, Principal Architect – Wipro
By David Marshall and Naidu Annamaneni, Consultants – Wipro
By Sayon Sur, Sr. Solutions Architect – Wipro
By Subrahmanyam Madduru, Global Partner Solutions Architect Lead – AWS
Silicon design companies require huge compute and storage needs to run their electronic design automation (EDA) workloads. These demands are dynamic in nature, and companies may need to invest in as much as 4x the steady state infrastructure in order to properly handle peak design workloads.
Additionally, as innovation in chip design moves towards chips that are thinner and faster, the demand on IT to meet the compute and storage needs keeps increasing.
The needs at each phase of the chip design lifecycle are also quite varied. Some EDA workloads are compute-intensive, whereas some are memory intensive or both. This spiky nature and the ever-changing elastic demands make EDA design ideally suited for the cloud.
With the need for nearly 4x the steady-state capacity during the tape-out stage, many companies are seeking to optimize their existing infrastructure and achieve the required elasticity by migrating silicon design workloads to the cloud. However, in the absence of tools that can intelligently and efficiently manage resources in the cloud, a company’s costs can escalate quickly.
In this post, we’ll discuss the business and process challenges of silicon design companies and how Nuage, Wipro’s smart orchestrator solution built on Amazon Web Services (AWS), can help accelerate silicon design. Nuage can do this through prediction and optimization by leveraging the near infinite compute, storage, and other resources available on AWS, thereby accelerating their product development lifecycle and time to market.
Wipro is an AWS Premier Tier Services Partner with eight AWS Competencies, including Industrial Software Consulting and Data and Analytics Consulting. Wipro is also a member of the AWS Managed Service Provider (MSP) and Well-Architected Partner Programs.
Business and Process Challenges
Current processes and tools within the organization lead to significant delays and significantly unoptimized use of resources.
Figure 1 – Silicon engineering – existing challenges.
Most organizations require the expertise of design engineers to estimate the resource details correctly, which rarely happens. This either leads to underutilization or overutilization of resources.
In some cases, organizations have tools that provide design engineers with estimates. However, these numbers are not qualified with sufficient details, which makes the data unreliable. For instance, not capturing tool details or flow type information for the baselined tool makes the data pretty much useless.
Organizations need a tool that takes the guesswork out of the workload submission process, and ensures the job is run on the right host (a server allocated with right amount of process and memory resources) as the environment hyperscales appropriately.
Most companies have a host of design tools and require users to navigate through those tools to complete their job submission process.
These tools are also incomplete as they do not capture all of the required data that’s essential to determine the resource needs of a workload. The captured data sits in silos—for example, the usage data is in an IT service management (ITSM) system, job data is in the grid engine’s system, technology-specific data is in another system, and project phase information is in a different system altogether.
Stitching through these disparate data requires users to submit specific details at the time of job submission, which they may or may not do depending on their understanding of the relevance. This makes the process of data analysis incomplete with lots of missing data.
What organizations need is for the process to be automated and completely seamless. This makes for easier adoption and lower dependence on the manual stitching of the tools to see the complete picture.
In addition to the wrong resources being assigned to the job, the current processes are such that if some additional resources are unavailable, the host is still allocated for the job as the grid engine waits for resources to be made available.
Some grid engines, while they come with AWS-specific plugins that can create instances on the cloud, they are unable to leverage the right instance types or even the lower-cost Spot options provided by AWS.
The ability to optimize the utilization doesn’t just go into tightening leaks in the processes, but also into making sure costs are optimized wherever possible.
To get the right elasticity in a cost-effective manner, businesses need an intelligent solution that helps predict system demand and supports their cost and architectural planning. This solution would work in all environments—whether on-premises, in the cloud, or in a hybrid environment.
Built on AWS, Wipro-Nuage is a smart orchestrator that accelerates silicon design through prediction and optimization of resources using AWS cloud-native services. It’s built to support running EDA workloads on AWS and is scheduler agnostic. It’s also a scalable system that provides a flexible architecture addressing specific business objectives of an organization that align with the overall processes within the enterprise.
The goal of Wipro-Nuage is to help a semiconductor enterprise provide a seamless user experience to their chip designers, while enabling the enterprise to scale to its infrastructure requirements efficiently and use resources in a cost-optimized manner.
The following diagram summarizes solution paradigms required to address business objectives, along with the process and design challenges described in the previous section.
Figure 2 – Solution paradigms to address the business objectives.
Seamless User Experience
Designers should be focused on designing, and Wipro-Nuage takes the guesswork out of estimating resource needs of a job. It uses a predictive model, with the prediction engine as an online active process that integrates with the design engineers’ process of job submission. It then intercepts the job and performs the right orchestration to ensure the job is right-sized based on predictions.
Wipro-Nuage ensures the instance types are the right ones for the job. When created on AWS, instances are started only when the job is ready to be executed and terminated as soon as the job is complete, making for an almost 100% utilization. For short-running jobs, Wipro-Nuage optimizes cost further by using Spot instances to further reduce the costs to the organization.
Wipro-Nuage ensures the environment scales out as necessary. Every environment is a combination of static instances that are part of the data center on the cloud, as well as dynamically provisioned instances that get created on demand. Wipro-Nuage ensures the instances are allocated to ensure the reserved and static instances are utilized first before dynamic instances are created.
Wipro-Nuage automates the process of identifying the right resource requirements and ensures that only the right amount of compute or memory resources are reserved during the process of electronic design automation. It uses Amazon SageMaker to build, train, and deploy machine learning (ML) models that help in identifying the right data size.
When a design engineer submits a job, a Wipro-Nuage agent invokes an AWS Lambda function responsible for orchestrating the process of identifying the right size for the workload. If an instance is not available in the pool, Nuage will use an AWS CloudFormation template to create the right-sized instance for the job.
Figure 3 – Wipro-Nuage solution architecture and cloud deployment.
In the next section, we’ll provide an overview of Wipro-Nuage solution architecture and its components.
Smart Agent integrates with the Scheduler Engine, listening to the grid for new workload and job requests. It interacts with it by making native calls to pick up the submitted jobs, query them for their status, and modify them to use the right host.
Smart Orchestrator is one of the core components of Wipro-Nuage and consists of the following sub-components:
- Workflow Manager: This is the key component of the Smart Orchestrator that picks up the jobs and has the right intelligence to determine the size, instance type, and correct instance from the different ones available in the ecosystem. As required, it invokes the resource manager to create the instance. With the right host identified, it asks the Smart Agent to schedule the job on that host.
- Prediction Engine: The prediction engine is built on Scikit learn. When the request comes in from the Orchestrator’s workflow manager, it invokes the right model based on the metadata of the submitted job.
- Resource Manager: Once the Workflow Manager determines the right instance is not available in the infrastructure, it dynamically creates the right one. Amazon Machine Image (AMI) templates are created with various toolsets and prerequisites as required for the environment. For the job to run, AWS Cloud Development Kit (CDK) is used for creating dynamic CloudFormation templates which are used to provision instances with the correct instance family and the AMI. By leveraging Spot instances to run the jobs, Resource Manager can help the companies in saving resource costs.
- Scheduler: Periodically, the Workflow Manager needs to perform activities related to the resources and resource-specific data. These are scheduled as separate periodic jobs on the scheduler and are responsible for: cleanup of the unused hosts, aggregating Wipro-Nuage orchestrator data and pushing it out to the data lake/repository, and triggering the workflow to initiate model development to ensure a performant model.
The Smart Modeler is a standalone component that creates and recalibrates models using historical data. As newer data is made available, it recalibrates the model either periodically or as the model deteriorates. It uses MLflow for purposes of managing the model lifecycle. The model would be deployed on Amazon SageMaker in a cloud environment.
The modeler is deployed using CloudFormation templates and comes up when the model needs to be created or refined.
As design engineers submit their jobs, the data gets collected on the job performance. This self-service environment provides the following view to end users:
- Performance of the queues: How many slots are used, and what percentage of jobs are in active/exit/zombie status.
- Performance of the hosts: Utilization and consumption of the individual hosts.
- Performance of the licenses: Utilization and consumption of licenses of the tools and their features.
- Performance of the job: Job status and utilization.
- Performance of the model: How the models are performing with respect to the error.
It also allows for some self-service capabilities based on which the design engineer can choose to terminate a job if it’s not performing as per expectation.
Summary of Benefits
With Wipro-Nuage, semiconductor companies can move away from fractured processes, experiences, and data sets to gain the following benefits:
- Integrated processes: Wipro-Nuage can be integrated with organizational tools, policies, and processes, making for a seamless and frictionless adoption.
- Automated environment: No manual intervention is needed to integrate various in-house tools, thus eliminating errors.
- Efficient and scalable systems: As new tools, instances, server types, and versions get added, the tool learns, allowing it to scale and optimize effectively and efficiently.
- Cost-optimized infrastructure: Wipro-Nuage eliminates waste with its automated optimal selection of servers and automatic use of Spot instances when appropriate. It can potentially increase the utilization of infrastructure resources by 100% and reduce IT costs by 20-30%.
In this post, we have shown how Wipro-Nuage leverages AWS infrastructure to enable organizations to run their high-performance computing (HPC) services on the cloud. In the process, they can provide an automated, efficient, and scalable system that will cost optimize electronic design automation (EDA) workloads as they migrate to AWS.
With the Wipro-Nuage smart orchestrator solution, Wipro is accelerating cloud adoption for EDA, or any HPC customer that wants to take advantage of cloud elasticity.
Wipro – AWS Partner Spotlight
Wipro is an AWS Premier Tier Services Partner and MSP that harnesses the power of cognitive computing, hyper-automation, robotics, cloud, analytics, and emerging technologies to help clients adapt to the digital world.
*Already worked with Wipro? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.