MaCBench — Multimodal Benchmark for Chemistry & Materials Research
M3RG Lab |
Workshop (NeurIPS AIforMat Spotlight):
OpenReview
|
Extended version:
arXiv:2411.16955
|
Journal:
Nature Computational Science
- Co-developed MaCBench, a real-world benchmark spanning data extraction, experimental understanding, and results interpretation for chemistry/materials workflows.
- Ran systematic evaluations of frontier multimodal models and characterized failure modes beyond basic perception—especially spatial reasoning, cross-modal synthesis, and multi-step inference.
- Released the initial workshop spotlight paper and contributed to the expanded evaluation + analysis in the extended/journal version.
Knowledge Base Effort
Prof. N.M. Anoop Krishnan & Prof. Mausam | M3RG Lab | GitHub: Repo
- Developed a knowledge base and data extraction model to aggregate material science data from scientific literature, forming a foundation for domain-specific large language models.
- Focused on advanced entity linkage and extracting scientific information such as chemical formulas to enhance data accessibility and usability.
- Built an end-to-end, multi-agent pipeline that ingests publisher XML's of research papers, extracts structured text & tables, mines chemical compositions, and contextually links them to material property queries with evidence & confidence scoring.
- Productionized a fine-tuned 8B LLaMAT model (local, GPU-aware, low-cost) with domain-adapted prompts, scientific sentence chunking and table classification that feed a growing materials knowledge base.
SmolMoE — Mixture-of-Experts Transformer Language Model (Upcycling + Continued Pretraining)
Cohere Labs — Scholar Take-Home
- Implemented a compact decoder-only language model with Mixture-of-Experts (MoE) feed-forward blocks, integrating routing logic and MoE-aware training components.
- Added MoE observability and routing-health monitoring: expert utilization / load distribution tracking, plus a routing-specialization style metric to quantify how selectively experts are used.
- Built an upcycling path to convert a dense Transformer into an MoE model by copying backbone weights and initializing expert banks from the dense MLP, enabling continued-pretraining from a dense checkpoint.
AstroLLaVA — Astronomy Vision-Language Model
UniverseTBD | Paper:
arXiv:2504.08583
- Contributed to evaluation + deployment workflows for AstroLLaVA, a domain-adapted vision-language model for astronomy built on the LLaVA stack.
- Supported experimental comparisons on astronomy image–text tasks and helped harden the pipeline for reproducible runs and model release.
- Project output includes curated astronomy visual QA resources and benchmarking for astronomy-focused multimodal reasoning.
Meta-Agentic Retrieval-Augmented Generation System
Hackathon | Inter-IIT Technical Meet, 2024
- Developed a platform integrating Retrieval-Augmented Generation (RAG) with dynamic knowledge graph creation, enabling real-time data retrieval from diverse sources like academic papers, web content, and live news.
- Designed a meta-agent for autonomous optimization, agent creation, and feedback incorporation, enhancing adaptability and accuracy.
Hangman Solver — Multi-Model Ensemble
Trexquant Quant Hiring Challenge | Sequence Models, Boosted Trees & N-gram LMs
- Models explored: baseline heuristic, BiLSTM consonant predictor, BiLSTM-Attention, LightGBM (26 binary letter classifiers), 1–5 gram frequency tables, static neural–symbolic blends, temperature-scaled fusion, candidate-pruning inference, and an MLP meta-learner (abandoned).
- System design: length-aware vowel priors, masked-state modeling, calibrated probability fusion, fast n-gram lookup, and constraint-based candidate filtering.
- Top results:
Offline evaluation peaked at 0.81 win-rate (BiLSTM-Attn + calibrated blend + pruning).
Public server runs achieved 0.646 mean win-rate (3×1000 games) vs ~18% baseline.
AstroLLaMA
Dr. Ioana Ciuca | UniverseTBD
- Co-developed AstroLLaMA, a 7-billion-parameter language model fine-tuned from LLaMA-2 using over 300,000 astronomy abstracts from arXiv, achieving a 30% reduction in perplexity compared to LLaMA-2.
- Facilitated the public release of AstroLLaMA to promote astronomy-focused research, including applications in automatic paper summarization and the development of conversational agents.
AstroTalks
Dr. David Hendriks | UniverseTBD
- Developed a program leveraging OpenAI's Whisper AI tool to transcribe video datasets accurately.
- Collaborated with NASA ADS to create and integrate LLM-based pipelines for real-time transcript extraction, topic modeling, and periodic model retraining, enhancing platform performance and accessibility.