Monday, November 13, 2023

JUPITER supercomputer to leverage NVIDIA Grace Hopper, Infiniband

JUPITER, an exascale supercomputer being built at the Forschungszentrum Jülich facility in Germany, will be powered by the NVIDIA Grace Hopper accelerated computing architecture. NVIDIA says it be the world’s most powerful AI system when completed in 2024, able to deliver extreme-scale computing power for AI and simulation workloads. 

JUPITER, which is owned by the EuroHPC Joint Undertaking and contracted to Eviden and ParTec, is being built in collaboration with NVIDIA, ParTec, Eviden and SiPearl to accelerate the creation of foundational AI models in climate and weather research, material science, drug discovery, industrial engineering and quantum computing.

JUPITER marks the debut of a quad NVIDIA GH200 Grace Hopper Superchip node configuration, based on Eviden’s BullSequana XH3000 liquid-cooled architecture, with a booster module comprising close to 24,000 NVIDIA GH200 Superchips interconnected with the NVIDIA Quantum-2 InfiniBand networking platform.  

The NVIDIA Quantum-2 family of switches comprises 64 400Gb/s ports or 128 200Gb/s ports on physical 32 octal small form-factor (OSFP) connectors. The compact 1U switch design includes air-cooled and liquid-cooled versions that are either internally or externally managed. The NVIDIA Quantum-2 family of switches delivers an aggregated 51.2 terabits per second (Tb/s) of bidirectional throughput with a capacity of more than 66.5 billion packets per second (bpps).

NVIDIA's quad GH200 features a node architecture with 288 Arm Neoverse cores capable of achieving 16 petaflops of AI performance using up to 2.3 terabytes of high-speed memory. Four GH200 processors are networked through a high-speed NVIDIA NVLink connection.

“The JUPITER supercomputer powered by NVIDIA GH200 and using our advanced AI software will deliver exascale AI and HPC performance to tackle the greatest scientific challenges of our time,” said Ian Buck, vice president of hyperscale and HPC at NVIDIA. “Our work with Jülich, Eviden and ParTec on this groundbreaking system will usher in a new era of AI supercomputing to advance the frontiers of science and technology.”

“At the heart of JUPITER is NVIDIA’s accelerated computing platform, making it a groundbreaking system that will revolutionize scientific research,” said Thomas Lippert, director of the Jülich Supercomputing Centre. “JUPITER combines exascale AI and exascale HPC with the world’s best AI software ecosystem to boost the training of foundational models to new heights.”

NVIDIA debuts HGX H200 Tensor Core GPU

NVIDIA launched its H200 Tensor Core GPU based on its Hopper architecture and designed with advanced memory to handle massive amounts of data for generative AI and high performance computing workloads.

The NVIDIA H200 is the first GPU to offer HBM3e — faster, larger memory to fuel the acceleration of generative AI and large language models, while advancing scientific computing for HPC workloads. With HBM3e, the NVIDIA H200 delivers 141GB of memory at 4.8 terabytes per second, nearly double the capacity and 2.4x more bandwidth compared with its predecessor, the NVIDIA A100.

H200-powered systems from the world’s leading server manufacturers and cloud service providers are expected to begin shipping in the second quarter of 2024.



With HBM3e, the H200 delivers 141 GB of memory at 4.8 terabytes per second, nearly doubling the capacity and providing 2.4 times more bandwidth compared to its predecessor, the NVIDIA A100.

NVIDIA H200 will be available in NVIDIA HGX H200 server boards with four- and eight-way configurations, which are compatible with both the hardware and software of HGX H100 systems.

H200-powered systems from leading server manufacturers and cloud service providers are anticipated to hit the market in the second quarter of 2024. 

NVIDIA says the introduction of H200 will lead to further performance leaps, including nearly doubling inference speed on Llama 2, a 70 billion-parameter LLM, compared to the H100. Additional performance leadership and improvements with H200 are expected with future software updates. Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure will be among the first cloud service providers to deploy H200-based instances starting next year, in addition to CoreWeave, Lambda and Vultr.

“To create intelligence with generative AI and HPC applications, vast amounts of data must be efficiently processed at high speed using large, fast GPU memory,” said Ian Buck, vice president of hyperscale and HPC at NVIDIA. “With NVIDIA H200, the industry’s leading end-to-end AI supercomputing platform just got faster to solve some of the world’s most important challenges.”


U.S. outlines National Spectrum Strategy

 The White House published a National Spectrum Strategy that aims to expand access to advanced wireless broadband networks and technologies, whether terrestrial-, airspace-, satellite- or space-based, for all Americans. 

The 23-page paper was developed by the National Telecommunications and Information Administration (NTIA), in collaboration with the FCC and in coordination with other Federal agencies.

The Strategy has four pillars

  • Pillar One: A Spectrum Pipeline to Ensure U.S. Leadership in Advanced and Emerging Technologies
  • Pillar Two: Collaborative Long-Term Planning to Support the Nation’s Evolving Spectrum Needs
  • Pillar Three: Unprecedented Spectrum Innovation, Access, and Management through Technology Development
  • Pillar Four: Expanded Spectrum Expertise and Elevated National Awareness

Significantly this Strategy identifies five spectrum bands in government hands totaling 2,786 megahertz of mostly mid-band spectrum for in-depth, near-term study to determine suitability for potential repurposing to address evolving needs,, including terrestrial wireless broadband, innovative space services, and unmanned aviation and other autonomous vehicle operations. This includes the following:

Lower 3 GHz (3.1-3.45 GHz)

  • The Department of Defense (DoD) has studied the potential for sharing 350 megahertz of spectrum with the private sector, determining that sharing is feasible with advanced interference-mitigation features and a coordination framework.
  • The Departments of Commerce and Defense will co-lead follow-on studies focusing on future use of the 3.1-3.45 GHz band, exploring dynamic spectrum sharing and private-sector access while preserving Federal mission capabilities
5030-5091 MHz
  •  The FCC, in coordination with NTIA and the Federal Aviation Administration, will facilitate limited deployment of UAS in this band, followed by studies to optimize UAS spectrum access while avoiding harmful interference to other operations.
7125-8400 MHz
  •  This 1,275 megahertz of spectrum will be studied for wireless broadband use, with some sub-bands potentially studied for other uses, while protecting incumbent users from harmful interference.
18.1-18.6 GHz
  • This 500 megahertz of spectrum will be studied for expanded Federal and non-Federal satellite operations, consistent with the U.S. position at the 2023 World Radiocommunication Conference.
37.0-37.6 GHz
  • This 600 megahertz of spectrum will be further studied to implement a co-equal, shared-use framework allowing Federal and non-Federal users to deploy operations in the band.

HPE tunes its supercomputing solutions for Gen AI

At Supercomputing 23 in Denver, HPE announced a supercomputing solution for generative AI designed for large enterprises, research institutions, and government organizations to accelerate the training and tuning of artificial intelligence (AI) models using private data sets.

Key elements:

  • AI/ML acceleration software – A suite of three software tools will help customers train and tune AI models and create their own AI applications.
  • HPE Machine Learning Development Environment is a machine learning (ML) software platform that enables customers to develop and deploy AI models faster by integrating with popular ML frameworks and simplifying data preparation.
  • NVIDIA AI Enterprise for security, stability, manageability, and support. It offers extensive frameworks, pretrained models, and tools that streamline the development and deployment of production AI.
  • HPE Cray Programming Environment suite offers programmers a complete set of tools for developing, porting, debugging and refining code.
  • Scale – Based on the HPE Cray EX2500, an exascale-class system, and featuring NVIDIA GH200 Grace Hopper Superchips, the solution can scale up to thousands of graphics processing units (GPUs) with an ability to dedicate the full capacity of nodes to support a single, AI workload for faster time-to-value. The system is the first to feature the quad GH200 Superchip node configuration.
  • HPE Slingshot Interconnect offers an open, Ethernet-based high performance network designed to support exascale-class workloads. 

“The world’s leading companies and research centers are training and tuning AI models to drive innovation and unlock breakthroughs in research, but to do so effectively and efficiently, they need purpose-built solutions,” said Justin Hotard, executive vice president and general manager, HPC, AI & Labs at Hewlett Packard Enterprise. “To support generative AI, organizations need to leverage solutions that are sustainable and deliver the dedicated performance and scale of a supercomputer to support AI model training. We are thrilled to expand our collaboration with NVIDIA to offer a turnkey AI-native solution that will help our customers significantly accelerate AI model training and outcomes.”

https://www.hpe.com/us/en/newsroom/press-release/2023/11/hewlett-packard-enterprise-and-nvidia-accelerate-ai-training-with-new-turnkey-solution.html

Utah’s Strata Networks picks Ekinops for optical network

 Ekinops has been selected by Strata Networks, Utah's largest telecommunications cooperative, to upgrade its optical transport network using the Ekinops360 with FlexRate technology. 

Strata Networks, based in Roosevelt, Utah, extends its network throughout the Uintah Basin, into the Wasatch Front, and to Denver, serving a diverse demographic including mid-sized urban and remote rural areas. Strata also chose Ekinops' advanced network management system.

Ekinops says its optical solution along with 200G and 400G FlexRate modules and Celestis NM will enhance the scope and performance of Strata's optical transport network. This upgrade allows Strata to extend 100G links all the way to its point-of-presence in Denver, serving its customers in Colorado. 

The Ekinops PM400FR05 utilizes high-power pluggable coherent optics to provide up to 400G of capacity for metro/regional connectivity at a lower cost than traditional transponders. 

Additionally, Celestis NMS provides Strata with complete control over its network, allowing for monitoring, troubleshooting, and service upgrades from a centralized network operations center, thereby minimizing truck rolls and reducing the company's carbon footprint.

https://www.ekinops.com

Linux Foundation to form the High Performance Software Foundation

 The Linux Foundation announced an intention to form the High Performance Software Foundation (HPSF) with an aim to build, promote, and advance a portable software stack for high performance computing (HPC.

HPSF intends to leverage investments made by the United States Department of Energy's (DOE) Exascale Computing Project (ECP), the EuroHPC Joint Undertaking, and other international projects in accelerated HPC to exploit the performance of this diversifying set of architectures. 

HPSF will be organized as an umbrella project under the Linux Foundation. It will provide a neutral space for pivotal projects in the high performance software ecosystem, enabling industry, academia, and government entities to collaborate together on the scientific software stack.

The HPSF is launching with the following initial open source technical projects:

  • Spack: the HPC package manager
  • Kokkos: a performance-portable programming model for writing modern C++ applications in a hardware-agnostic way.
  • AMReX: a performance-portable software framework designed to accelerate solving partial differential equations on block-structured, adaptively refined    meshes.
  • WarpX: a performance-portable Particle-in-Cell code with advanced algorithms that won the 2022 Gordon Bell Prize
  • Trilinos: a collection of reusable scientific software libraries, known in particular for linear, non-linear, and transient solvers, as well as optimization and uncertainty quantification.
  • Apptainer: a container system and image format specifically designed for secure high-performance computing.
  • VTK-m: a toolkit of scientific visualization algorithms for accelerator architectures.
  • HPCToolkit: performance measurement and analysis tools for computers ranging from laptops to the world’s largest GPU-accelerated supercomputers.
  • E4S: the Extreme-scale Scientific Software Stack
  • Charliecloud: HPC-tailored, lightweight, fully unprivileged container implementation.

O-RAN Alliance awards $200K to Northeastern University

 The O-RAN ALLIANCE (O-RAN) has awarded $200,000 in seed funding to the Institute for the Wireless Internet of Things at Northeastern University for their proposal to develop an O-RAN digital twin platform based on the Colosseum network emulator, with the capability to automate end-to-end AI/ML development, integration, and testing.

The funding initiative was led by O-RAN ALLIANCE's next Generation Research Group (nGRG). Its objective is to provide a forum to facilitate O-RAN related 6G research efforts and determine how O-RAN may evolve to support mobile wireless networks in the 6G timeframe and beyond, by leveraging industry and academic 6G research efforts worldwide. The purpose of the seed funding is to be a significant enabler for broader funding of research platforms for next generation infrastructure.

In addition to the winning proposal, the O-RAN ALLIANCE also recognized two other proposals with honorable mentions for their excellent quality and value for the industry:

  • EURECOM and OpenAirInterface Software Alliance – Evolving 5G End-to-End Network Platform towards a Next Generation Infrastructure
  • Virginia Polytechnic Institute and State University and George Mason University – FEMO-CLOUD: Federated, Multi-site O-Cloud Platform for Next generation RAN Research and Experimentation

“The O-RAN ALLIANCE continues to focus on developing a stable and mature specification framework for open and intelligent RAN, enabling the RAN industry to deliver commercial products and solutions,” said Alex Jinsung Choi, Chair of the Board of O-RAN ALLIANCE, and SVP Network Technology at Deutsche Telekom. “It’s great to see such high interest and cooperation in research for open innovations in future RAN generations, which will provide the basis for upcoming detailed specifications by the O-RAN ALLIANCE to enable even higher-performing and more feature-rich mobile networks.”

www.o-ran.org

Arm appoints Ami Badani as Chief Marketing Officer

Arm appointed Ami Badani as chief marketing officer (CMO).

Badani joins Arm from NVIDIA where she held the role of Vice President of Marketing and Developer Products. At NVIDIA, her responsibilities included cultivating the developer ecosystem for Data Processing Units (DPUs), driving the data strategy for Generative AI, and leading the company’s product and technical marketing efforts for the data center portfolio, one of NVIDIA’s largest growth areas. Prior to NVIDIA, Ami was CMO at Cumulus Networks, a provider of enterprise-class software that was acquired by NVIDIA in 2020. Badani also held marketing and product management leadership roles at several technology companies, including Cisco Systems. Prior to joining Cisco, Badani worked as an investment banker at Goldman Sachs and J.P. Morgan.

“As we continue to advance the Arm compute platform, reaching a more diverse set of customers and developers in the AI era is critical,” said Rene Haas, chief executive officer, Arm. “Ami’s experience in AI and proven track record in creating awareness among developer ecosystems make her a natural fit to lead our marketing efforts in building the future of computing on Arm.”

https://www.arm.com




Utah’s Strata Networks picks Ekinops for optical upgrade

Ekinops has been selected by Strata Networks, Utah's largest telecommunications cooperative, to upgrade its optical transport network using the Ekinops360 with FlexRate technology. 

Strata Networks, based in Roosevelt, Utah, extends its network throughout the Uintah Basin, into the Wasatch Front, and to Denver, serving a diverse demographic including mid-sized urban and remote rural areas. Strata also chose Ekinops' advanced network management system.

Ekinops says its optical solution along with 200G and 400G FlexRate modules and Celestis NM will enhance the scope and performance of Strata's optical transport network. This upgrade allows Strata to extend 100G links all the way to its point-of-presence in Denver, serving its customers in Colorado. 

The Ekinops PM400FR05 utilizes high-power pluggable coherent optics to provide up to 400G of capacity for metro/regional connectivity at a lower cost than traditional transponders. 

Additionally, Celestis NMS provides Strata with complete control over its network, allowing for monitoring, troubleshooting, and service upgrades from a centralized network operations center, thereby minimizing truck rolls and reducing the company's carbon footprint.

https://www.ekinops.com

Arelion activates PoP at DataVerge interconnect in Brooklyn

Arelion established a point-presence for its Internet backbone, AS1299, at DataVerge, the owner and operator of the only carrier-neutral interconnection facility in Brooklyn. This provides DataVerge customers access to Arelion’s portfolio of leading connectivity services, including high-speed IP Transit, Dedicated Internet Access (DIA), Cloud Connect, Global 40G Ethernet Virtual Circuit (VC), IPX, and DDoS Mitigation services. High availability is guaranteed by dual entry points and diverse paths into the DataVerge Datacenter.

https://www.arelion.com/about-us/press-releases/new-pop-in-new-york