About

I am an Assistant Professor in the Department of Biostatistics and Health Data Science at the University of Pittsburgh. Previously, I was a postdoctoral researcher in the laboratory of Prof. Gordon Smyth at the Walter and Eliza Hall Institute of Medical Research. I received my PhD in Biostatistics from the University of North Carolina at Chapel Hill under the supervision of Dr. Naim Rashid and Dr. Joseph Ibrahim.

My research interests focus on developing statistical methods and open-source bioinformatic tools to analyze data from a wide range of high-throughput genomic, transcriptomic, and proteomic technologies. I enjoy creating new statistical methods and writing efficient bioinformatic tools that help researchers interpret their data. I am particularly interested in developing methods and software for the Bioconductor Project. One key goal of the methods I develop is to identify molecular features, such as genomic coordinates, genes/transcripts, or proteins, that change in accessibility, expression, or abundance between experimental conditions.

At Pitt, I collaborate with research groups from the Department of Orthopedic Surgery to understand the genomic and molecular mechanisms in spine-related diseases, other orthopedic issues, and traumatic brain injury. These collaborations provide the foundation for my methodological and computational ideas.

Projects

  • Differential transcript expression with edgeR v4
    • Improved differential transcript expression pipeline with Salmon’s Gibbs sampling and the new bias-corrected quasi-likelihood method with adjusted deviances for small counts from edgeR v4.
    • R/Bioconductor
    • Workflow
    • Preprint
  • catchSalmon/catchKallisto (within edgeR)
    • Estimation of mapping ambiguity overdispersion from transcript quantification of short read RNA-seq data. It unlocks uncertainty-free differential expression assessment at the transcript-level within edgeR.
    • R/Bioconductor
    • Workflow and User’s Guide
    • Paper
  • epigraHMM
    • A toolkit for the analysis of epigenomic datasets such as ChIP-seq, ATAC-seq, CUT&RUN, and CUT&Tag. It performs differential and consensus peak calling from multi-sample multi-condition datasets.
    • R/Bioconductor
    • Vignette
    • Paper
  • ZIMHMM
    • A consensus peak caller for epigenomic datasets. It implements a fast hidden Markov model with mixed-effects zero-inflated negative binomial emissions using sample-specific random effects.
    • GitHub
    • Paper

Timeline

Pedro Baldoni


About

I am an Assistant Professor in the Department of Biostatistics and Health Data Science at the University of Pittsburgh. Previously, I was a postdoctoral researcher in the laboratory of Prof. Gordon Smyth at the Walter and Eliza Hall Institute of Medical Research. I received my PhD in Biostatistics from the University of North Carolina at Chapel Hill under the supervision of Dr. Naim Rashid and Dr. Joseph Ibrahim.

My research interests focus on developing statistical methods and open-source bioinformatic tools to analyze data from a wide range of high-throughput genomic, transcriptomic, and proteomic technologies. I enjoy creating new statistical methods and writing efficient bioinformatic tools that help researchers interpret their data. I am particularly interested in developing methods and software for the Bioconductor Project. One key goal of the methods I develop is to identify molecular features, such as genomic coordinates, genes/transcripts, or proteins, that change in accessibility, expression, or abundance between experimental conditions.

At Pitt, I collaborate with research groups from the Department of Orthopedic Surgery to understand the genomic and molecular mechanisms in spine-related diseases, other orthopedic issues, and traumatic brain injury. These collaborations provide the foundation for my methodological and computational ideas.

Projects

  • Differential transcript expression with edgeR v4
    • Improved differential transcript expression pipeline with Salmon’s Gibbs sampling and the new bias-corrected quasi-likelihood method with adjusted deviances for small counts from edgeR v4.
    • R/Bioconductor
    • Workflow
    • Preprint
  • catchSalmon/catchKallisto (within edgeR)
    • Estimation of mapping ambiguity overdispersion from transcript quantification of short read RNA-seq data. It unlocks uncertainty-free differential expression assessment at the transcript-level within edgeR.
    • R/Bioconductor
    • Workflow and User’s Guide
    • Paper
  • epigraHMM
    • A toolkit for the analysis of epigenomic datasets such as ChIP-seq, ATAC-seq, CUT&RUN, and CUT&Tag. It performs differential and consensus peak calling from multi-sample multi-condition datasets.
    • R/Bioconductor
    • Vignette
    • Paper
  • ZIMHMM
    • A consensus peak caller for epigenomic datasets. It implements a fast hidden Markov model with mixed-effects zero-inflated negative binomial emissions using sample-specific random effects.
    • GitHub
    • Paper

Timeline