AmazonTextract Blog

Improve data extraction and document processing with Amazon Textract

Intelligent document processing (IDP) has seen widespread adoption across enterprise and government organizations. Gartner estimates the IDP market will grow more than 100% year over year, and is projected to reach $4.8 billion in 2022. IDP helps transform structured, semi-structured, and unstructured data from a variety of document formats into actionable information. Processing unstructured data …

wrangler image

Automated exploratory data analysis and model operationalization framework with a human in the loop

Identifying, collecting, and transforming data is the foundation for machine learning (ML). According to a Forbes survey, there is widespread consensus among ML practitioners that data preparation accounts for approximately 80% of the time spent in developing a viable ML model. In addition, many of our customers face several challenges during the model operationalization phase …

image001 8

Move Amazon SageMaker Autopilot ML models from experimentation to production using Amazon SageMaker Pipelines

Amazon SageMaker Autopilot automatically builds, trains, and tunes the best custom machine learning (ML) models based on your data. It’s an automated machine learning (AutoML) solution that eliminates the heavy lifting of handwritten ML models that requires ML expertise. Data scientists need to only provide a tabular dataset and select the target column to predict, …

Discover why leaders need to upskill teams in ML, AI and data

Tech companies are ramping up the search for highly skilled data analytics, AI and ML professionals, with the race to AI accelerating the crunch.1 They are looking for cloud experts  who can successfully build, test, run, and manage complex tools and infrastructure, in roles such as data analysts, data engineers, data scientists, and ML engineers. …

Unearthing Data: Vision AI Startup Digs Into Digital Twins for Mining and Construction

Skycatch, a San Francisco-based startup, has been helping companies mine both data and minerals for nearly a decade. The software-maker is now digging into the creation of digital twins, with an initial focus on the mining and construction industry, using the NVIDIA Omniverse platform for connecting and building custom 3D pipelines. SkyVerse, which is a …

Dale Durran 378x400 1

Stormy Weather? Scientist Sharpens Forecasts With AI

Editor’s note: This is the first in a series of blogs on researchers advancing science in the expanding universe of high performance computing. A perpetual shower of random raindrops falls inside a three-foot metal ring Dale Durran erected outside his front door (shown above). It’s a symbol of his passion for finding order in the …

12A2F7t91r3ldIiiXgxykXs5w

Data Pipeline Version Control: Tracking Code & Data Together (Palantir RFx Blog Series, #3)

Editor’s note: This is the third post in the Palantir RFx Blog Series, which explores how organizations can better craft RFIs and RFPs to evaluate digital transformation software. Each post focuses on one key capability area within a data ecosystem, with the goal of helping companies ask the right questions to better assess technology. Introduction …

ML 9275 image001

Cost efficient ML inference with multi-framework models on Amazon SageMaker 

Machine learning (ML) has proven to be one of the most successful and widespread applications of technology, affecting a wide range of industries and impacting billions of users every day. With this rapid adoption of ML into every industry, companies are facing challenges in supporting low-latency predictions and with high availability while maximizing resource utilization …

ml 12002 image001

Solve business problems end-to-end through machine learning in Amazon SageMaker JumpStart solutions

Amazon SageMaker JumpStart provides pre-trained, open-source models for a wide range of problem types to help you get started with machine learning (ML). JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for ML with Amazon SageMaker. As a business user, you get to do the following …

Emily Webber

Train gigantic models with near-linear scaling using sharded data parallelism on Amazon SageMaker

In the pursuit of superior accuracy, deep learning models in areas such as natural language processing and computer vision have significantly grown in size in the past few years, frequently counted in tens to hundreds of billions of parameters. Training these gigantic models is challenging and requires complex distribution strategies. Data scientists and machine learning …