Storage technology explained: AI and data storage

Artificial intelligence (AI) and machine learning (ML) promises a step change in the automation fundamental to IT, with applications ranging from simple chatbots to almost unthinkable levels of complexity, content generation and control.

Storage forms a key part of AI, to supply data for training and store the potentially huge volumes of data generated, or during inference when the results of AI are applied to real-world workloads.

In this article, we look at the key characteristics of AI workloads, their storage input/output (I/O) profile, the types of storage suited to AI, the suitability of cloud and object storage for AI, and storage supplier strategy and products for AI. 

What are the key features of AI workloads?

AI and ML are based on training an algorithm to detect patterns in data, gain insight into data and often to trigger responses based on those findings. Those could be very simple recommendations based on sales data, such as the “people who bought this also bought” type of recommendation. Or they could be the kind of complex content we see from large language models (LLMs) in generative AI (GenAI) trained on vast and multiple datasets to allow it to create convincing text, images and video.

There are three key phases and deployment types to AI workloads:

  1. Training, where recognition is worked into the algorithm from the AI model dataset, with varying degrees of human supervision;
  2. Inference, during which the patterns identified in the training phase are put to work, either in standalone AI deployments and/or;
  3. Deployment of AI to an application or sets of applications.

Where and how AI and ML workloads are trained and run can vary significantly. On the one hand, they can resemble batch or one-off training and inference runs that resemble high-performance computing (HPC) processing on specific datasets in science and research environments. On the other hand, AI, once trained, can be applied to continuous application workloads, such as the types of sales and marketing operations described above.

The types of data in training and operational datasets could vary from a great many small files in, for example, sensor readings in internet of things (IoT) workloads, to very large objects such as image and movie files or discrete batches of scientific data. File size upon ingestion also depends on AI frameworks in use (see below).

Datasets could also form part of primary or secondary data storage, such as sales records or data held in backups, which is increasingly seen as a valuable source of corporate information.

What are the I/O characteristics of AI workloads?

Training and inferencing in AI workloads usually requires massively parallel processing, using graphics processing units (GPUs) or similar hardware that offload processing from central processing units (CPUs).

Processing performance needs to be exceptional to handle AI training and inference in a reasonable timeframe and with as many iterations as possible to maximise quality.

Infrastructure also potentially needs to be able to scale massively to handle very large training datasets and outputs from training and inference. It also requires speed of I/O between storage and processing, and potentially also to be able to manage portability of data between locations to enable the most efficient processing.

Data is likely to be unstructured and in large volumes, rather than structured and in databases.

What kind of storage do AI workloads need?

As we’ve seen, massive parallel processing using GPUs is the core of AI infrastructure. So, in short, the task of storage is to supply those GPUs as quickly as possible to ensure these very costly hardware items are used optimally.

More often than not, that means flash storage for low latency in I/O. Capacity required will vary according to the scale of workloads and the likely scale of the results of AI processing, but hundreds of terabytes, even petabytes, is likely.

Adequate throughput is also a factor as different AI frameworks store data differently, such as between PyTorch (large number of smaller files) and TensorFlow (the reverse). So, it’s not just a case of getting data to GPUs quickly, but also at the right volume and with the right I/O capabilities.

Recently, storage suppliers have pushed flash-based storage – often using high-density QLC flash – as a potential general-purpose storage, including for datasets hitherto considered “secondary”, such as backup data, because customers may now want to access it at higher speed using AI.

Storage for AI projects will range from that which provides very high performance during training and inference to various forms of longer-term retention because it won’t always be clear at the outset of an AI project what data will be useful.

Is cloud storage good for AI workloads?

Cloud storage could be a viable consideration for AI workload data. The advantage of holding data in the cloud brings an element of portability, with data able to be “moved” nearer to its processing location.

Many AI projects start in the cloud because you can use the GPUs for the time you need them. The cloud is not cheap, but to deploy hardware on-premise, you need to have committed to a production project before it is justified.

All the key cloud providers offer AI services that range from pre-trained models, application programming interfaces (APIs) into models, AI/ML compute with scalable GPU deployment (Nvidia and their own) and storage infrastructure scalable to multiple petabytes.

Is object storage good for AI workloads?

Object storage is good for unstructured data, able to scale massively, often found in the cloud, and can handle almost any data type as an object. That makes it well-suited for the large, unstructured data workloads likely in AI and ML applications.

The presence of rich metadata is another plus to object storage. It can be searched and read to help find and organise the right data for AI training models. Data can be held almost anywhere, including in the cloud with communication via the S3 protocol.

But metadata, for all its benefits, can also overwhelm storage controllers and affect performance. And, if cloud is a location for cloud storage, cloud costs need to be taken into account as data is accessed and moved.

What do storage suppliers offer for AI?

Nvidia provides reference architectures and hardware stacks that include servers, GPUs and networking. These are the DGX BasePOD reference architecture and DGX SuperPOD turnkey infrastructure stack, which can be specified for industry verticals.

Storage suppliers have also focused on the I/O bottleneck so data can be delivered efficiently to large numbers of (very costly) GPUs.

Those efforts have ranged from integrations with Nvidia infrastructure – the key player in GPU and AI server technology – via microservices such as NeMo for training and NIM for inference to storage product validation with AI infrastructure, and to entire storage infrastructure stacks aimed at AI.  

Supplier initiatives have also centred on the development of retrieval augmented generation (RAG) pipelines and hardware architectures to support it. RAG validates the findings of AI training by reference to external, trusted information, in part to tackle so-called hallucinations.

Which storage suppliers offer products validated for Nvidia DGX?

Numerous storage suppliers have products validated with DGX offerings, including the following.

DataDirect Networks (DDN) offers its A³I AI400X2 all-NVMe storage appliances with SuperPOD. Each appliance delivers up to 90GBps throughput and three million IOPS.

Dell’s AI Factory is an integrated hardware stack spanning desktop, laptop and server PowerEdge XE9680 compute, PowerScale F710 storage, software and services and validated with Nvidia’s AI infrastructure. It is available via Dell’s Apex as-a-service scheme.

IBM has Spectrum Storage for AI with Nvidia DGX. It is a converged, but separately scalable compute, storage and networking solution validated for Nvidia BasePOD and SuperPod.

Backup provider Cohesity announced at Nvidia’s GTC 2024 event that it would integrate Nvidia NIM microservices and Nvidia AI Enterprise into its Gaia multicloud data platform, which allows use of backup and archive data to form a source of training data.

Hammerspace has GPUDirect certification with Nvidia. Hammerspace markets its Hyperscale NAS as a global file system built for AI/ML workloads and GPU-driven processing.

Hitachi Vantara has its Hitachi iQ, which provides industry-specific AI systems that use Nvidia DGX and HGX GPUs with the company’s storage.

HPE has GenAI supercomputing and enterprise systems with Nvidia components, a RAG reference architecture, and plans to build in NIM microservices. In March 2024, HPE upgraded its Alletra MP storage arrays to connect two times the number of servers and four times the capacity in the same rackspace with 100Gbps connectivity between nodes in a cluster.

NetApp has product integrations with BasePOD and SuperPOD. At GTC 2024 NetApp announced integration of Nvidia’s NeMo Retriever microservice, a RAG software offering, with OnTap customer hybrid cloud storage.

Pure Storage has AIRI, a flash-based AI infrastructure certified with DGX and Nvidia OVX servers and using Pure’s FlashBlade//S storage. At GTC 2024, Pure announced it had created a RAG pipeline that uses Nvidia NeMo-based microservices with Nvidia GPUs and its storage, plus RAGs for specific industry verticals.

Vast Data launched its Vast Data Platform in 2023, which marries its QLC flash-and-fast-cache storage subsystems with database-like capabilities at native storage I/O level, and DGX certification.

In March 2024, hybrid cloud NAS maker Weka announced a hardware appliance certified to work with Nvidia’s DGX SuperPod AI datacentre infrastructure.

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Related Posts
LG Innotek is rumored to sign a 1 trillion won supply contract with Tesla thumbnail

LG Innotek is rumored to sign a 1 trillion won supply contract with Tesla

据etnews报道,LG Innotek 将向特斯拉提供价值超过1万亿韩元的摄像头,供货合同将在第一季度完成并开始量产。据了解,LG Innotek的摄像头将安装在特斯拉面向北美和欧洲市场Model Y和 Model 3,以及即将发布的电动卡车semi上。 该报道指出,特斯拉将于本月开始在美国得克萨斯州奥斯汀运营其在当地的第一家超级工厂。随着特斯拉第五个生产基地的运营,它正在寻求使电动汽车关键零部件的供应多样化,该公司在奥斯汀Gigafactory 投产前就已订购了大型零部件。此前特斯拉从中国和美国订购了摄像头,但受中美贸易战的影响,其对韩国企业表现出了极大的关注。去年,特斯拉与三星电机签订了4900亿韩元的摄像头模组供应合同。LG Innotek则在去年首次供货特斯拉,这意味着它通过了特斯拉的可靠性测试,不过去年的供给量仅为数百亿韩元,今年的合同一旦完成,交易规模将是去年的10倍,这将对公司的业绩大有裨益。另外,该报道指出,如果在第一季度内完成供货,LG Innotek将成为特斯拉的核心零部件供应商。考虑到特斯拉奥斯汀工厂的距离,预计生产将在LG Innotek墨西哥工厂进行。LG Innotek 的一位高管表示,“我们无法确认特斯拉摄像头的供应情况。”(校对/Sharon)
Read More
เล่าเรื่องผ่านเลนส์: แนะนำเคล็ดลับการบันทึกวิดีโอ 4K ด้วยสมาร์ทโฟนแบบสุดประทับใจ thumbnail

เล่าเรื่องผ่านเลนส์: แนะนำเคล็ดลับการบันทึกวิดีโอ 4K ด้วยสมาร์ทโฟนแบบสุดประทับใจ

ในโลกที่คนพึ่งพาข้อมูลข่าวสารทางออนไลน์มากขึ้นอย่างต่อเนื่อง  เทรนด์ดิจิทัลของประเทศไทย ก็ไม่ต่างกันโดยเฉพาะกับกลุ่มอายุ 16 ถึง 64 ปี จำนวนถึง 99% ที่ใช้เวลาออนไลน์ไปกับการรับชมคอนเทนต์วิดีโอ เทรนด์ดังกล่าวมีความสอดคล้องไปกับรายงานของ LinkedIn  ว่าอาชีพที่ต้องมีทักษะในการตัดต่อวิดีโอ เช่น Digital Content Specialist นับว่าเป็นอีกอาชีพหนึ่งเติบโตขึ้นไปด้วย ดังนั้นคนที่มีความตั้งใจที่จะประกอบอาชีพ หรือแค่อยากจะลองทำวิดีโอออนไลน์ควรรู้เทคนิคการถ่ายวิดีโอคุณภาพสูงระดับ 4K บนสมาร์ทโฟนที่อาจเป็นประโยชน์ในอนาคตเอาไว้ ไม่ต้องกังวลถึงอุปกรณ์ราคาสูงต่างๆ เลย 1. ถ่ายยังไงให้นิ่ง ภาพถ่ายจากโหมดวิดีโอของ HUAWEI P50 Pro สิ่งที่คนถ่ายวิดีโอแบบ handheld เจอกันเยอะคือการถ่ายแล้วภาพสั่น มือไม่นิ่ง ถ้าใช้กล้องทั่วไปก็ต้องมีไทรพอด ต้องแบกของหลายชิ้น หรือใช้โปรแกรมใส่ Stabilization ให้ภาพที่สั่นมีความนิ่งมากขึ้น ก็ต้องเพิ่มขั้นตอนในการตัดต่อฟุตเทจ แน่นอนว่าสมาร์ทโฟนตัวท็อปหลายรุ่นในยุคนี้มาพร้อมระบบกันสั่นกันแล้ว แต่บางรุ่นก็มาพร้อมระบบกันสั่นที่พัฒนาไปอีกขั้นอย่างเช่น HUAWEI P50 Pro มาพร้อมระบบกันสั่นทั้งฮาร์ดแวร์ที่เป็น OIS และซอฟต์แวร์ที่มาพร้อมระบบป้องกันภาพสั่นไหว AIS ที่ HUAWEI P50 Pro อัปเกรดมาเป็น AIS Pro ที่ช่วยยกระดับคุณภาพด้วยการประสานงานระหว่างการถ่ายภาพและวิดีโอให้นิ่ง ลดการสั่นไหวแม้ในระยะซูมไกล ซูมแล้วไม่หลุดโฟกัสจากวัตถุที่ถ่ายอยู่ 2. ใส่ลูกเล่นตั้งแต่เวลาถ่าย ภาพถ่ายจากโหมดวิดีโอของ HUAWEI P50 Pro สมาร์ทโฟนไม่ได้มีแค่ฟีเจอร์พื้นฐานของกล้องสำหรับการถ่ายภาพนิ่งและวิดีโอ แต่ยังมีสเปเชียลเอฟเฟ็กต์ต่างๆ ที่เลือกใช้ระหว่างอัดวิดีโอได้เลยโดยไม่ต้องพึ่งโปรแกรมตัดต่อ อย่างการถ่ายพระอาทิตย์ตกดินตอนจบ Vlog ก็อาจจะใช้โหมด Time-lapse ระหว่างตั้งกล้องถ่ายได้เลย หรืออยากเก็บภาพการเคลื่อนไหวที่รวดเร็วอย่างละเอียดก็ใช้โหมด Slow Motion ได้ทันที ในกรณีที่ต้องการคุมโทน แทนที่จะไปเกรดสีภายหลัง ลองเล่นฟิลเตอร์โทนสีต่างๆ ที่เครื่องมีมาให้เพื่อปรับอารมณ์วิดีโอให้เป็นดั่งใจก็ได้ เช่น โทนภาพขาวดำเท่ห์ๆ นอกจากนั้นสมาร์ทโฟนบางรุ่นล่าสุดอย่าง HUAWEI P50 Pro ยังมาพร้อมเทคโนโลยีที่มีความเฉพาะอย่าง HUAWEI XD Fusion HDR Video ที่ทำให้ภาพคมชัดทุกรายละเอียด ทุกสถานการณ์…
Read More
The cheapest Volkswagen is coming to Turkey!  Here is the Volkswagen Goal thumbnail

The cheapest Volkswagen is coming to Turkey! Here is the Volkswagen Goal

Türkiye’de en çok tutulan markalardan biri olan Volkswagen, özellikle Latin Amerika’da çok satan Gol modelini ülkemizde de satışa sunabilir. İşte detaylar. 19.01.2022 20:30 19.01.2022 20:30 Son dönemde otomobil fiyatlarındaki artış hepimizin malumu. Peki mevcut fiyat skalasıyla sıfır otomobil satışlarının artması mümkün mü? Aslında mümkün. Bunun için yapılması gereken şey daha ucuz modelleri satışa sunabilmek. Örneğin…
Read More
Here are which apps have earned the most from the Facebook & co blackout. thumbnail

Here are which apps have earned the most from the Facebook & co blackout.

Il più lungo down della storia di Facebook, WhatsApp, Messenger e Instagram ha significato per molti un brusco cambiamento della routine online. Per alcuni è stata una serata di panico, per altri l'occasione di riprendere le attività lontano dallo schermo, mentre altri ancora hanno semplicemente cercato un alternativa per sfogare i loro bisogni sociali. Questi…
Read More
Index Of News
Total
0
Share