Arafath Nihar

Engineer/Researcher building AI and scientific computing systems

Platform, data infrastructure, machine learning, and research engineering.

Bio

Engineer/Researcher building AI and scientific computing systems building petabyte-scale research platforms, cloud-native systems, distributed analytics pipelines, and reproducible ML infrastructure. I architect systems that help researchers turn complex, large-scale data into reproducible science: distributed storage, secure data access, containerized environments, HPC workflow orchestration, graph neural network workflows, and analytics products.

I am currently a Ph.D. candidate in Computer Science at Case Western Reserve University. My work spans Hadoop/HPC research infrastructure, FAIR data systems, photovoltaic fleet modeling, spatiotemporal graph learning, scientific software, and machine-learning platforms for materials data science.

Hadoop + HPC Cloud architecture ML infrastructure Graph neural networks FAIR data systems Scientific software

Digital Card

Digital business card

Arafath Nihar

Engineer/Researcher building AI and scientific computing systems.

Save Contact

Phone-to-phone sharing uses the native share sheet. NFC writes the profile link to a physical tag on compatible Android Chrome devices.

Experience

Solutions Architect — Jatango

Cleveland, OH · Jan 2025 – Aug 2025

Led cloud-native architecture for a live commerce platform built on Azure, Kubernetes, .NET, GraphQL, and Next.js. Translated business requirements into technical architecture, evaluated distributed-system scalability, presented technical roadmaps to stakeholders, and collaborated on patent specifications.

Data Science Team Lead — Lawrence Livermore National Laboratory

Livermore, CA · Aug 2024

Led an undergraduate team applying machine learning to ECG analysis. Built preprocessing, normalization, and Gramian Angular Field image-transformation workflows; compared statistical baselines, CNNs, and ResNet18 transfer learning; and explored interpretable ECG-to-transmembrane-voltage prediction models against prior synthetic-data ECG reconstruction work.

Research Engineer, Materials Data Science — SDLE Research Center / MDS3, Case Western Reserve University

Cleveland, OH · 2021 – 2025

Architected and evolved CRADLE: a hybrid Hadoop-HPC platform supporting materials data science workflows across distributed storage, interactive analysis, large-scale compute, and browser-accessible research environments.

Key contributions:

  • Scaled infrastructure to 1,000+ compute cores, 30 NVIDIA A2 GPUs, 19+ storage nodes, and 2.5 PB raw capacity.
  • Built reproducible Apptainer/Singularity environments exposed through Open OnDemand for researchers, courses, and 500+ students annually.
  • Developed platform tooling including CradleTools, SDLEFleets/CradleFleets, CradleDataLoader, CradleDataExplorer, and cradlesgis.
  • Implemented secure identity and authorization through university SSO, Kerberos-mapped credentials, keytabs, and Apache Ranger.
  • Designed GCP-based CI/CD for container builds, registry, conversion, storage, and HPC deployment automation.
  • Built photovoltaic graph neural network workflows over 100,000+ PV systems and 29 billion time-series measurements for fleet forecasting, imputation, anomaly-oriented reconstruction, and performance analysis.
  • Supported workflows spanning photovoltaic time-series data, AFM image segmentation, geospatial ingestion, semantic web documentation, and large-scale analytics.

Data Scientist / Cloud Engineer — Edifice Analytics

Cleveland, OH · 2019 – 2021

Built automated pipelines for building energy time-series analysis, developed Angular/Plotly dashboards, created secure REST APIs, and designed hybrid AWS architecture connecting cloud services to on-premise Hadoop/HBase infrastructure.

Software Engineer — OrangeHRM

Colombo, Sri Lanka · 2016 – 2019

Designed and maintained production software including automated deployment tooling with real-time WebSocket progress, a hybrid Ionic mobile application for Red Hat, and Symfony/PHP/MySQL systems following test-driven development practices.

Projects

Photovoltaic Graph Neural Networks

Built spatiotemporal graph neural network workflows for PV fleet forecasting, imputation, and performance analysis across 100,000+ systems and 29 billion time-series measurements, including graph construction, Gramian Angular Field features, and graph-attention autoencoder experiments.

CRADLE Hybrid Hadoop-HPC Architecture

Architected a hybrid platform integrating Hadoop clusters with HPC infrastructure for materials science workloads. Combined distributed preprocessing and large data stores with high-performance CPU/GPU nodes for simulation, modeling, and ML.

Open OnDemand Web Access

Packaged browser-accessible HPC applications for JupyterLab, RStudio Server, code-server, TensorBoard, Spark History Server, Label Studio, Ollama, XFCE desktops, Shiny apps, WebVOWL, and CRADLE Data Explorer.

CradleFleets / SDLEFleets

Built Python tooling that turns Parquet workload tables and worker scripts into validated SLURM array submissions with CPU/GPU controls, logging, queue-aware throttling, requeueing, and status inspection.

CradleTools and data interfaces

Designed Python/R packages abstracting HBase, HDFS/WebHDFS, Impala/Hive, REDCap, Arrow, and SLURM workflows with APIs for table discovery, scans, dataframe ingestion, and zero-copy cross-language exchange.

NSRDB solar data pipeline

Converted National Solar Radiation Database HDF5 datasets spanning 2+ million sites into partitioned Parquet for distributed analytics, with PySpark validation, feature construction, and SLURM-ready execution.

MDS-Onto Open

Contributed to public ontology release workflows for FAIR materials data: React/Vite/Material UI site work, semantic web file publication, generated documentation, downloadable RDF artifacts, and CI/CD publishing.

Browser-accessible scientific workstations

Built CRADLE-oriented graphical Linux workstation containers with XFCE, KasmVNC, HDFView, Gwyddion, Labelme, Firefox, LibreOffice, KDE utilities, WebDAV, Kerberos libraries, and GraphDB Desktop.

CI/CD for Singularity deployment

Designed a GCP-based pipeline from Docker definitions to Cloud Build, Artifact Registry, Apptainer/Singularity conversion, Google Cloud Storage, and SLURM-managed HPC deployment.

WordPress and portfolio sites on AWS VPS

Hosted and administered WordPress web properties including edificeanalytics.com and mds3-coe.com on AWS VPS infrastructure, and maintained a public portfolio site with Linux/cloud hosting, troubleshooting, and operational maintenance.

Fitness Data Science

Personal R/Quarto analytics project analyzing workout data enriched with Lyfta metadata, time-series plots, training volume aggregation, calendar heatmaps, and reproducible GitHub Pages publishing.
GitHub · Site

Publications

  1. Rajamohan, B.P., Bradley, A.C.H., Tran, V.D., …, Nihar, A., et al. Materials Data Science Ontology (MDS-Onto): Unifying Domain Knowledge in Materials and Applied Data Science. Scientific Data, 12, 628, 2025. doi:10.1038/s41597-025-04938-5
  2. Ciardi, T.G., Nihar, A., Chawla, R. et al. Materials data science using CRADLE: A distributed, data-centric approach. MRS Communications, 2024. doi:10.1557/s43579-024-00616-6
  3. Hernandez, K.J., Ciardi, T.G., Yamamoto, R. et al. L-PBF High-Throughput Data Pipeline Approach for Multi-modal Integration. Integrating Materials and Manufacturing Innovation, 2024. doi:10.1007/s40192-024-00368-0
  4. Hernandez, K.J., Barcelos, E.I., Jimenez, J.C. et al. A data integration framework of additive manufacturing based on FAIR principles. MRS Advances, 2024. doi:10.1557/s43580-024-00874-5
  5. Nihar, A. et al. Accelerating Time to Science using CRADLE: A Framework for Materials Data Science. IEEE HiPC, Goa, India, 2023, pp. 216–226. doi:10.1109/HiPC58850.2023.00041
  6. Lu, M., Venkat, S.N., Augustino, J. et al. Image Processing Pipeline for Fluoroelastomer Crystallite Detection in Atomic Force Microscopy Images. Integrating Materials and Manufacturing Innovation, 2023. doi:10.1007/s40192-023-00320-8
  7. A. Nihar et al. Toward Findable, Accessible, Interoperable and Reusable (FAIR) Photovoltaic System Time Series Data. IEEE PVSC, 2021, pp. 2481–2486. doi:10.1109/PVSC43889.2021.9518782
  8. Liangyi Huang, Sophia Hall, Fei Shao, Arafath Nihar, Vipin Chaudhary, Yinghui Wu, Roger French, and Xusheng Xiao. System-Auditing, Data Analysis and Characteristics of Cyber Attacks for Big Data Systems. ACM CIKM, 2022. doi:10.1145/3511808.3557185
  9. M. Li et al. Data-Driven Photovoltaic Module Performance Analysis with FAIR Data. IEEE PVSC, 2023. doi:10.1109/PVSC48320.2023.10359605

Presentations & Awards

  • Invited speaker, CRADLE: A Distributed and High-Performance Computing Framework for Research, 9th CWRU x Tohoku Joint Workshop / CWRU-Tohoku Joint Symposium, Tohoku University, Sendai, Japan, August 2023. Speaker profile
  • Unifying High-Performance and Distributed Computing for Materials Data Science, 2023 Fall Materials Research Society Meeting, Boston, MA.
  • Towards Usability and Reproducibility in Distributed and High Performance Computing Environment for Big Data Research with CRADLE, 8th Annual Data Science in Engineering and Life Sciences Symposium, Case Western Reserve University.
  • Lawrence Livermore National Lab Data Science Challenge Scholar, 2024.
  • IEEE HiPC Best Paper Nomination, 2023.

Education

  • Ph.D. in Computer Science, Case Western Reserve University — expected May 2026. Dissertation: Auto Scaling Materials Data Science Machine Learning Pipelines.
  • B.S. in Computer Science, University of Colombo School of Computing — 2016.

Technical areas

  • Infrastructure: Hadoop, HDFS, HBase, Ozone, JanusGraph, Impala, Hive, YARN, SLURM, Linux, Kerberos, Apache Ranger.
  • Cloud & platforms: AWS, Azure, GCP, Kubernetes, Docker, Apptainer/Singularity, Open OnDemand, NGINX.
  • Data & ML: Python, R, Spark, PyTorch, TensorFlow, ONNX, Optuna, RAPIDS, graph neural networks, graph autoencoders, spatiotemporal forecasting, pandas, PyArrow, geospatial and time-series analytics.
  • Web & product: Next.js, React, Vite, Angular, GraphQL, REST APIs, WordPress, Plotly, Shiny, Quarto.