Pedro Baldoni

About

I am an Assistant Professor in the Department of Biostatistics and Health Data Science at the University of Pittsburgh. Previously, I was a postdoctoral researcher in the laboratory of Prof. Gordon Smyth at the Walter and Eliza Hall Institute of Medical Research. I received my PhD in Biostatistics from the University of North Carolina at Chapel Hill under the supervision of Dr. Naim Rashid and Dr. Joseph Ibrahim.

My research interests focus on developing statistical methods and open-source bioinformatic tools to analyze data from a wide range of high-throughput genomic, transcriptomic, and proteomic technologies. I enjoy creating new statistical methods and writing efficient bioinformatic tools that help researchers interpret their data. I am particularly interested in developing methods and software for the Bioconductor Project. One key goal of the methods I develop is to identify molecular features, such as genomic coordinates, genes/transcripts, or proteins, that change in accessibility, expression, or abundance between experimental conditions.

Projects

Differential transcript usage with limma and edgeR
- Uncertainty-aware differential transcript usage analysis method with divided counts. The diffSplice function is now a generic function with methods for limma MArrayLM objects and edgeR DGEGLM objects for Bioconductor v3.21 and above.
- limma and edgeR
- Workflow
- Preprint
Differential transcript expression with edgeR v4
- Improved differential transcript expression pipeline with Salmon’s Gibbs sampling and the new bias-corrected quasi-likelihood method with adjusted deviances for small counts from edgeR v4.
- R/Bioconductor
- Workflow
- Paper
catchSalmon/catchKallisto (within edgeR)
- Estimation of mapping ambiguity overdispersion from transcript quantification of short read RNA-seq data. It unlocks uncertainty-free differential expression assessment at the transcript-level within edgeR.
- R/Bioconductor
- Workflow and User’s Guide
- Paper
epigraHMM
- A toolkit for the analysis of epigenomic datasets such as ChIP-seq, ATAC-seq, CUT&RUN, and CUT&Tag. It performs differential and consensus peak calling from multi-sample multi-condition datasets.
- R/Bioconductor
- Vignette
- Paper
ZIMHMM
- A consensus peak caller for epigenomic datasets. It implements a fast hidden Markov model with mixed-effects zero-inflated negative binomial emissions using sample-specific random effects.
- GitHub
- Paper

Timeline

About

Projects

Differential transcript usage with limma and edgeR

Uncertainty-aware differential transcript usage analysis method with divided counts. The diffSplice function is now a generic function with methods for limma MArrayLM objects and edgeR DGEGLM objects for Bioconductor v3.21 and above.
limma and edgeR
Workflow
Preprint

Differential transcript expression with edgeR v4

Improved differential transcript expression pipeline with Salmon’s Gibbs sampling and the new bias-corrected quasi-likelihood method with adjusted deviances for small counts from edgeR v4.
R/Bioconductor
Workflow
Paper

catchSalmon/catchKallisto (within edgeR)

Estimation of mapping ambiguity overdispersion from transcript quantification of short read RNA-seq data. It unlocks uncertainty-free differential expression assessment at the transcript-level within edgeR.
R/Bioconductor
Workflow and User’s Guide
Paper

epigraHMM

A toolkit for the analysis of epigenomic datasets such as ChIP-seq, ATAC-seq, CUT&RUN, and CUT&Tag. It performs differential and consensus peak calling from multi-sample multi-condition datasets.
R/Bioconductor
Vignette
Paper

ZIMHMM

A consensus peak caller for epigenomic datasets. It implements a fast hidden Markov model with mixed-effects zero-inflated negative binomial emissions using sample-specific random effects.
GitHub
Paper