Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. Amazon SageMaker Canvas is a low-code no-code visual interface to build and deploy ML models without the need to write code. Based on customers’ feedback, we have combined the advanced ML-specific data preparation capabilities of SageMaker Data Wrangler inside SageMaker Canvas, providing users with an end-to-end, no-code workspace for preparing data, and building and deploying ML models.
By abstracting away much of the complexity of the ML workflow, SageMaker Canvas enables you to prepare data, then build or use a model to generate highly accurate business insights without writing code. Additionally, preparing data in SageMaker Canvas offers many enhancements, such as page loads up to 10 times faster, a natural language interface for data preparation, the ability to view the data size and shape at every step, and improved replace and reorder transforms to iterate on a data flow. Finally, you can one-click create a model in the same interface, or create a SageMaker Canvas dataset to fine-tune foundation models (FMs).
This post demonstrates how you can bring your existing SageMaker Data Wrangler flows—the instructions created when building data transformations—from SageMaker Studio Classic to SageMaker Canvas. We provide an example of moving files from SageMaker Studio Classic to Amazon Simple Storage Service (Amazon S3) as an intermediate step before importing them into SageMaker Canvas.
The high-level steps are as follows:
In this example, we use a folder called data-wrangler-classic-flows
as a staging folder for migrating flow files to Amazon S3. It is not necessary to create a migration folder, but in this example, the folder was created using the file system browser portion of SageMaker Studio Classic. After you create the folder, take care to move and consolidate relevant SageMaker Data Wrangler flow files together. In the following screenshot, three flow files necessary for migration have been moved into the folder data-wrangler-classic-flows,
as seen in the left pane. One of these files, titanic.flow
, is opened and visible in the right pane.
To copy the flow files to Amazon S3, complete the following steps:
The following screenshot shows an example of what the Amazon S3 sync process should look like. You will get a confirmation after all files are uploaded. You can adjust the preceding code to meet your unique input folder and Amazon S3 location needs. If you don’t want to create a folder, when you enter the terminal, simply skip the change directory (cd
) command, and all flow files on your entire SageMaker Studio Classic file system will be copied to Amazon S3, regardless of origin folder.
After you upload the files to Amazon S3, you can validate that they have been copied using the Amazon S3 console. In the following screenshot, we see the original three flow files, now in an S3 bucket.
To import the flow files into SageMaker Canvas, complete the following steps:
After you import the files, the SageMaker Data Wrangler page will refresh to show the newly imported files, as shown in the following screenshot.
Choose one of the flows (for this example, we choose titanic.flow
) to launch the SageMaker Data Wrangler transformation.
Now you can add analyses and transformations to the data flow using a visual interface (Accelerate data preparation for ML in Amazon SageMaker Canvas) or natural language interface (Use natural language to explore and prepare data with a new capability of Amazon SageMaker Canvas).
When you’re happy with the data, choose the plus sign and choose Create model, or choose Export to export the dataset to build and use ML models.
This post has provided guidance on using Amazon S3 to migrate SageMaker Data Wrangler flow files from a SageMaker Studio Classic environment. Phase 3: (Optional) Migrate data from Studio Classic to Studio provides a second method that uses your local machine to transfer the flow files. Furthermore, you can download single flow files from the SageMaker Studio tree control to your local machine, then import them manually in SageMaker Canvas. Choose the method that suits your needs and use case.
When you’re done, shut down any running SageMaker Data Wrangler applications in SageMaker Studio Classic. To save costs, you can also remove any flow files from the SageMaker Studio Classic file browser, which is an Amazon Elastic File System (Amazon EFS) volume. You can also delete any of the intermediate files in Amazon S3. After the flow files are imported into SageMaker Canvas, the files copied to Amazon S3 are no longer needed.
You can log out of SageMaker Canvas when you’re done, then relaunch it when you’re ready to use it again.
Migrating your existing SageMaker Data Wrangler flows to SageMaker Canvas is a straightforward process that allows you to use the advanced data preparations you’ve already developed while taking advantage of the end-to-end, low-code no-code ML workflow of SageMaker Canvas. By following the steps outlined in this post, you can seamlessly transition your data wrangling artifacts to the SageMaker Canvas environment, streamlining your ML projects and enabling business analysts and non-technical users to build and deploy models more efficiently.
Start exploring SageMaker Canvas today and experience the power of a unified platform for data preparation, model building, and deployment!
This post is co-written with Steven Craig from Hearst. To maintain their competitive edge, organizations…
Conspiracy theories about missing votes—which are not, in fact, missing—and something being “not right” are…
Researchers have developed AI-driven mobile robots that can carry out chemical synthesis research with extraordinary…
In recent years, roboticists have introduced robotic systems that can complete missions in various environments,…
Overwhelmed by manual tasks and data overload? Streamline your business and boost revenue with the…
In real life, the machine learning model is not a standalone object that only produces…