ML Lifecycle
Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? You’re not alone. Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production.
This post describes how to implement your first ML use case using Amazon SageMaker in just 8–12 weeks by leveraging a methodology called Experience-based Acceleration (EBA).
Customers may face several challenges when implementing machine learning (ML) solutions.
Machine learning EBA is a 3-day, sprint-based, interactive workshop (called a party) that uses SageMaker to accelerate business outcomes by guiding you through an accelerated and a prescriptive ML lifecycle. It starts with identifying business goals and ML problem framing, and takes you through data processing, model development, production deployment, and monitoring.
The following visual illustrates a sample ML lifecycle.
Two primary customer scenarios apply. The first is by using low-code or no-code ML services such as Amazon SageMaker Canvas, Amazon SageMaker Data Wrangler, Amazon SageMaker Autopilot, and Amazon SageMaker JumpStart to help data analysts prepare data, build models, and generate predictions. The second is by using SageMaker to help data scientists and ML engineers build, train, and deploy custom ML models.
We recognize that customers have different starting points. If you’re starting from scratch, it’s often simpler to begin with low-code or no-code solutions and gradually transition to developing custom models. In contrast, if you have an existing on-premises ML infrastructure, you can begin directly by using SageMaker to alleviate challenges with your current solution.
Through ML EBA, experienced AWS ML subject matter experts work side by side with your cross-functional team to provide prescriptive guidance, remove blockers, and build organizational capability for a continued ML adoption. This party steers you to solve a compelling business problem as opposed to thinking in terms of data and ML technology environments. Additionally, the party gets you started on driving material business value from untapped data.
ML EBA helps you to think big, start small, and scale fast. Although it creates a minimum viable ML model in 3 days, there are 4–6 weeks of preparation leading up to the EBA. Furthermore, you spend 4–6 weeks post-EBA to fine-tune the model with additional feature engineering and hyperparameter optimization before production deployment.
Let’s dive into what the whole process looks like and how you can use the ML EBA methodology to address the common blockers.
In this section, we detail the 4–6 weeks of preparation leading up to the EBA.
The first step is to frame and qualify the ML problem, which includes the following:
The AI Use Case Explorer is a good starting point to explore the right use cases by industry, business function, or desired business outcome and discover relevant customer success stories.
The next step is to identify the teams needed to support the EBA effort. Commonly, the work is split up between the following workstreams:
After these efforts have been completed, we must transition into action. A standard baseline 4-week timeline should be strictly adhered to make sure the EBA stays on track. Experienced AWS subject matter experts will guide and coach you through this preparation leading up to the EBA party.
Every customer is different; AWS helps you curate a technical plan of activities to be completed in the next 4 weeks leading up to the party.
AWS conducts Immersion Days to inspire your builders and build momentum for the party. An Immersion Day is a half or full day workshop with the right mix of presentation, hands-on labs, and Q&A to introduce AWS services or solutions. AWS will help you select the right Immersion Days from the AI/ML Workshops catalog.
We recognize that every builder in your organization is at a different level. We recommend that your builders use the ML ramp-up guide resources or digital or classroom training to start where they are at and build the necessary skills for the party.
Your cloud and data engineering teams should work on the following with guidance from AWS:
Your data science team should work on the following with guidance from AWS:
AWS works with you to assess go/no-go readiness for technical activities, skills, and momentum for the party. Then we solidify the scope for the 3-day party, prioritizing progress over perfection.
Although the EBA party itself is customized for your organization, the recommended agenda for the 3 days is shown in the following table. You will learn by doing during the EBA with guidance from AWS subject matter experts.
. | Day 1 | Day 2 | Day 3 |
Data Science | AM: Try AutoPilot or JumpStart models. PM: Pick 1–2 models based on AutoPilot outcomes to experiment further. | Improve model accuracy:
| Quality assurance and validation with test data. Deploy to production (inference endpoint). Monitoring setup (model, data drift). |
Data Engineering | Explore using feature store for future ML use cases. Create a backlog of items for data governance and associated guardrails. | ||
Cloud/MLOps Engineering | Evaluate the MLOps framework solution library. Assess if this can be used for a repeatable MLOps framework. Identify gaps and create a backlog of things to enhance the solution library or create your own MLOps framework. | Implement backlog items to create a repeatable MLOps framework. | Continue implementing backlog items to create a repeatable MLOps framework. |
ML involves extensive experimentation, and it’s common to not reach your desired model accuracy during the 3-day EBA. Therefore, creating a well-defined backlog or path to production is essential, including improving model accuracy through experimentation, feature engineering, hyperparameter optimization, and production deployment. AWS will continue to assist you through production deployment.
By complementing ML EBA methodology with SageMaker, you can achieve the following results:
Contact your AWS account team (Account Manager or Customer Solutions Manager) to learn more and get started.
The large language model (LLM) has become a cornerstone of many AI applications.
Computer use is a breakthrough capability from Anthropic that allows foundation models (FMs) to visually…
OpenAI's new API and Agents SDK consolidate a previously fragmented complex ecosystem into a unified,…
A directive from the National Institute of Standards and Technology eliminates mention of “AI safety”…
Speech foundation models, such as HuBERT and its variants, are pre-trained on large amounts of…
This post was co-written with Vishal Singh, Data Engineering Leader at Data & Analytics team…