The REmotely-managed Power Aware Computing Systems and Services (REPACSS) resource is a high-performance computing (HPC) cluster supported by multiple forms of energy developed to support research into advanced data center control for running scalable scientific workflows and data-intensive research in remotely managed settings. The focus of the project is on improvements to data center and infrastructure control to provide adaptability to emergent conditions and ability to adjust workloads to match data center load conditions including the availability and cost of electrical power. The CPU infrastructure comprises 110 AMD EPYC 9754 compute nodes with access to high-speed cluster-wide storage. Each CPU compute node offers 256 cores and 1.5TB of DDR5 memory, supported by local NVMe swap and temporary storage (1.92TB) to support high-speed checkpoint and restore and local ephemeral usage. The cpu nodes are interconnected with the rest of the cluster and with storage by NVIDIA ConnectX-7 network NDR Infiniband adapters running at 200 Gbps per card with two Infiniband cards per node. The Hammerspace storage provides nearly 3PB of combined NVMe and HDD storage, supporting large-scale data throughput. All nodes are controlled and provisioned through high-bandwidth Dell PowerSwitch S5248-ON and S5232-ON Ethernet switches at 25 Gbps per node. The cluster supports intelligent workload placement and adaptive scheduling tools to align computational activity with the goal to match as much of the workload as possible to low-cost energy availability. REPACSS also features advanced remote management capabilities and automation tools to manage scientific workflows that are specifically targeted to be adopted at scale by other resource facilities and industry.
  
    
      Texas Tech REPACSS CPU
  
Resource Type
              Compute
          Latest Status
              production
          Description
              User Guide URL
              
          Features
          Is an ACCESS Allocated Production Compute resource
          General compute use
          Unique, innovative or non-traditional compute resource
          Resource supports community software areas for users to share software with other users
          Resource offers discounted job queues where running jobs can be preempted
          Resource is allocated by ACCESS
          An intuitive, innovative, and interactive interface to remote computing resources
          Provides Globus data transfer and data sharing  services for local storage
          preemption
          NSF ACSS Category 2 Resources
          AI tools and support
              Organization Name
              Texas Tech University
          Global Resource ID
              repacss-cpu.ttu.access-ci.org