This post is co-written with Marc Neumann, Amor Steinberg and Marinus Krommenhoek from BMW Group.
The BMW Group – headquartered in Munich, Germany – is driven by 149,000 employees worldwide and manufactures in over 30 production and assembly facilities across 15 countries. Today, the BMW Group is the world’s leading manufacturer of premium automobiles and motorcycles, and provider of premium financial and mobility services. The BMW Group sets trends in production technology and sustainability as an innovation leader with an intelligent material mix, a technological shift towards digitalization, and resource-efficient production.
In an increasingly digital and rapidly changing world, BMW Group’s business and product development strategies rely heavily on data-driven decision-making. With that, the need for data scientists and machine learning (ML) engineers has grown significantly. These skilled professionals are tasked with building and deploying models that improve the quality and efficiency of BMW’s business processes and enable informed leadership decisions.
Data scientists and ML engineers require capable tooling and sufficient compute for their work. Therefore, BMW established a centralized ML/deep learning infrastructure on premises several years ago and continuously upgraded it. To pave the way for the growth of AI, BMW Group needed to make a leap regarding scalability and elasticity while reducing operational overhead, software licensing, and hardware management.
In this post, we will talk about how BMW Group, in collaboration with AWS Professional Services, built its Jupyter Managed (JuMa) service to address these challenges. JuMa is a service of BMW Group’s AI platform for its data analysts, ML engineers, and data scientists that provides a user-friendly workspace with an integrated development environment (IDE). It is powered by Amazon SageMaker Studio and provides JupyterLab for Python and Posit Workbench for R. This offering enables BMW ML engineers to perform code-centric data analytics and ML, increases developer productivity by providing self-service capability and infrastructure automation, and tightly integrates with BMW’s centralized IT tooling landscape.
JuMa is now available to all data scientists, ML engineers, and data analysts at BMW Group. The service streamlines ML development and production workflows (MLOps) across BMW by providing a cost-efficient and scalable development environment that facilitates seamless collaboration between data science and engineering teams worldwide. This results in faster experimentation and shorter idea validation cycles. Moreover, the JuMa infrastructure, which is based on AWS serverless and managed services, helps reduce operational overhead for DevOps teams and allows them to focus on enabling use cases and accelerating AI innovation at BMW Group.
Prior to introducing the JuMa service, BMW teams worldwide were using two on-premises platforms that provided teams JupyterHub and RStudio environments. These platforms were too limited regarding CPU, GPU, and memory to allow the scalability of AI at BMW Group. Scaling these platforms with managing more on-premises hardware, more software licenses, and support fees would require significant up-front investments and high efforts for its maintenance. To add to this, limited self-service capabilities were available, requiring high operational effort for its DevOps teams. More importantly, the use of these platforms was misaligned with BMW Group’s IT cloud-first strategy. For example, teams using these platforms missed an easy migration of their AI/ML prototypes to the industrialization of the solution running on AWS. In contrast, the data science and analytics teams already using AWS directly for experimentation needed to also take care of building and operating their AWS infrastructure while ensuring compliance with BMW Group’s internal policies, local laws, and regulations. This included a range of configuration and governance activities from ordering AWS accounts, limiting internet access, using allowed listed packages to keeping their Docker images up to date.
JuMa is a fully managed multi-tenant, security hardened AI platform service built on AWS with SageMaker Studio at the core. By relying on AWS serverless and managed services as the main building blocks of the infrastructure, the JuMa DevOps team doesn’t need to worry about patching servers, upgrading storage, or managing any other infrastructure components. The service handles all those processes automatically, providing a powerful technical platform that is generally up to date and ready to use.
JuMa users can effortlessly order a workspace via a self-service portal to create a secure and isolated development and experimentation environment for their teams. After a JuMa workspace is provisioned, the users can launch JupyterLab or Posit workbench environments in SageMaker Studio with just a few clicks and start the development immediately, using the tools and frameworks they are most familiar with. JuMa is tightly integrated with a range of BMW Central IT services, including identity and access management, roles and rights management, BMW Cloud Data Hub (BMW’s data lake on AWS) and on-premises databases. The latter helps AI/ML teams seamlessly access required data, given they are authorized to do so, without needing to build data pipelines. Furthermore, the notebooks can be integrated into the corporate Git repositories to collaborate using version control.
The solution abstracts away all technical complexities associated with AWS account management, configuration, and customization for AI/ML teams, allowing them to fully focus on AI innovation. The platform ensures that the workspace configuration meets BMW’s security and compliance requirements out of the box.
The following diagram describes the high-level context view of the architecture.
BMW AI/ML team members can order their JuMa workspace using BMW’s standard catalog service. After approval by the line manager, the ordered JuMa workspace is provisioned by the platform fully automatedly. The workspace provisioning workflow includes the following steps (as numbered in the architecture diagram).
After an AI model is developed and validated in JuMa, AI teams can use the MLOPs service of the BMW AI platform to deploy it to production quickly and effortlessly. This service provides users with a production-grade ML infrastructure and pipelines on AWS using SageMaker, which can be set up in minutes with just a few clicks. Users simply need to host their model on the provisioned infrastructure and customize the pipeline to meet their specific use case needs. In this way, the AI platform covers the entire AI lifecycle at BMW Group.
Following best practice architecting on AWS, the JuMa service was designed and implemented according to the AWS Well-Architected Framework. Architectural decisions of each Well-Architected pillar are described in detail in the following sections.
To assure full isolation between the tenants, each workspace receives its own AWS account, where the authorized users can jointly collaborate on analytics tasks as well as on developing and experimenting with AI/ML models. The JuMa portal itself enforces isolation at runtime using policy-based isolation with AWS Identity and Access Management (IAM) and the JuMa user’s context. For more information about this strategy, refer to Run-time, policy-based isolation with IAM.
Data scientists can only access their domain through the BMW network via pre-signed URLs generated by the portal. Direct internet access is disabled within their domain. Their Sagemaker domain privileges are built using Amazon SageMaker Role Manager personas to ensure least privilege access to AWS services needed for the development such as SageMaker, Amazon Athena, Amazon Simple Storage Service (Amazon S3), and AWS Glue. This role implements ML guardrails (such as those described in Governance and control), including enforcement of ML training to occur in either Amazon Virtual Private Cloud (Amazon VPC) or without internet and allowing only the use of JuMa’s custom vetted and up-to-date SageMaker images.
Because JuMa is designed for development, experimentation, and ad-hoc analysis, it implements retention policies to remove data after 30 days. To access data whenever needed and store it for long term, JuMa seamlessly integrates with the BMW Cloud Data Hub and BMW on-premises databases.
Finally, JuMa supports multiple Regions to comply to special local legal situations which, for example, require it to process data locally to enable BMW’s data sovereignty.
Both the JuMa platform backend and workspaces are implemented with AWS serverless and managed services. Using those services helps minimize the effort of the BMW platform team maintaining and operating the end-to-end solution, striving to be a no-ops service. Both the workspace and portal are monitored using Amazon CloudWatch logs, metrics, and alarms to check key performance indicators (KPIs) and proactively notify the platform team of any issues. Additionally, the AWS X-Ray distributed tracing system is used to trace requests throughout multiple components and annotate CloudWatch logs with workspace-relevant context.
All changes to the JuMa infrastructure are managed and implemented through automation using infrastructure as code (IaC). This helps reduce manual efforts and human errors, increase consistency, and ensure reproducible and version-controlled changes across both JuMa platform backend workspaces. Specifically, all workspaces are provisioned and updated through an onboarding process built on top of AWS Step Functions, AWS CodeBuild, and Terraform. Therefore, no manual configuration is required to onboard new workspaces to the JuMa platform.
By using AWS serverless services, JuMa ensures on-demand scalability, pre-approved instance sizes, and a pay-as-you-go model for the resources used during the development and experimentation activities per the AI/ML teams’ needs. To further optimize costs, the JuMa platform monitors and identifies idle resources within SageMaker Studio and shuts them down automatically to prevent expenses for non-utilized resources.
JuMa replaces BMW’s two on-premises platforms for analytics and deep learning workloads that consume a considerable amount of electricity and produce CO2 emissions even when not in use. By migrating AI/ML workloads from on premises to AWS, BMW will slash its environmental impact by decommissioning the on-premises platforms.
Furthermore, the mechanism for auto shutdown of idle resources, data retention polices, and the workspace usage reports to its owners implemented in JuMa help further minimize the environmental footprint of running AI/ML workloads on AWS.
By using SageMaker Studio, BMW teams benefit from an easy adoption of the latest SageMaker features that can help accelerate their experimentation. For example, they can use Amazon SageMaker JumpStart capabilities to use the latest pre-trained, open source models. Additionally, it helps reduce AI/ML team efforts moving from experimentation to solution industrialization, because the development environment provides the same AWS core services but restricted to development capabilities.
SageMaker Studio domains are deployed in a VPC-only mode to manage internet access and only allow access to intended AWS services. The network is deployed in two Availability Zones to protect against a single point of failure, achieving greater resiliency and availability of the platform to its users.
Changes to JuMa workspaces are automatically deployed and tested to development and integration environments, using IaC and CI/CD pipelines, before upgrading customer environments.
Finally, data stored in Amazon Elastic File System (Amazon EFS) for SageMaker Studio domains is kept after volumes are deleted for backup purposes.
In this post, we described how BMW Group in collaboration with AWS ProServe developed a fully managed AI platform service on AWS using SageMaker Studio and other AWS serverless and managed services.
With JuMa, BMW’s AI/ML teams are empowered to unlock new business value by accelerating experimentation as well as time-to-market for disruptive AI solutions. Furthermore, by migrating from its on-premises platform, BMW can reduce the overall operational efforts and costs while also increasing sustainability and the overall security posture.
To learn more about running your AI/ML experimentation and development workloads on AWS, visit Amazon SageMaker Studio.
TL;DR A conversation with 4o about the potential demise of companies like Anthropic. As artificial…
Whether a company begins with a proof-of-concept or live deployment, they should start small, test…
Digital tools are not always superior. Here are some WIRED-tested agendas and notebooks to keep…
Machine learning (ML) models are built upon data.
Editor’s note: This is the second post in a series that explores a range of…
David J. Berg*, David Casler^, Romain Cledat*, Qian Huang*, Rui Lin*, Nissan Pow*, Nurcan Sonmez*,…