About

I am a Senior Research Officer in the Bioinformatics and Computational Biology Division at WEHI, Australia. Previously, I was an Assistant Professor in the Department of Biostatistics and Health Data Science at the University of Pittsburgh, United States. I completed my postdoctoral training in statistical bioinformatics in the laboratory of Prof. Gordon Smyth at WEHI. I received my PhD in Biostatistics from the University of North Carolina at Chapel Hill, United States, under the supervision of A/Prof. Naim Rashid and Prof. Joseph Ibrahim.

My research interests focus on developing statistical methods and open-source bioinformatics tools to analyze data from a wide range of high-throughput genomic, transcriptomic, and proteomic technologies. I enjoy creating new statistical methods and writing efficient bioinformatics tools that help researchers interpret their data. I am particularly interested in developing methods and software for the Bioconductor Project. One key goal of the methods I develop is to identify molecular features, such as genomic coordinates, genes/transcripts, or proteins, that change in accessibility, expression, or abundance between experimental conditions.

Projects

  • Differential transcript usage with limma and edgeR
    • Uncertainty-aware differential transcript usage analysis method with divided counts. The diffSplice function is now a generic function with methods for limma MArrayLM objects and edgeR DGEGLM objects for Bioconductor v3.21 and above.
    • limma and edgeR
    • Workflow
    • Paper
  • Differential transcript expression with edgeR v4
    • Improved differential transcript expression pipeline with Salmon’s Gibbs sampling and the new bias-corrected quasi-likelihood method with adjusted deviances for small counts from edgeR v4.
    • R/Bioconductor
    • Workflow
    • Paper
  • catchSalmon/catchKallisto (within edgeR)
    • Estimation of mapping ambiguity overdispersion from transcript quantification of short read RNA-seq data. It unlocks uncertainty-free differential expression assessment at the transcript-level within edgeR.
    • R/Bioconductor
    • Workflow and User’s Guide
    • Paper
  • epigraHMM
    • A toolkit for the analysis of epigenomic datasets such as ChIP-seq, ATAC-seq, CUT&RUN, and CUT&Tag. It performs differential and consensus peak calling from multi-sample multi-condition datasets.
    • R/Bioconductor
    • Vignette
    • Paper
  • ZIMHMM
    • A consensus peak caller for epigenomic datasets. It implements a fast hidden Markov model with mixed-effects zero-inflated negative binomial emissions using sample-specific random effects.
    • GitHub
    • Paper

Timeline

Pedro Baldoni


About

I am a Senior Research Officer in the Bioinformatics and Computational Biology Division at WEHI, Australia. Previously, I was an Assistant Professor in the Department of Biostatistics and Health Data Science at the University of Pittsburgh, United States. I completed my postdoctoral training in statistical bioinformatics in the laboratory of Prof. Gordon Smyth at WEHI. I received my PhD in Biostatistics from the University of North Carolina at Chapel Hill, United States, under the supervision of A/Prof. Naim Rashid and Prof. Joseph Ibrahim.

My research interests focus on developing statistical methods and open-source bioinformatics tools to analyze data from a wide range of high-throughput genomic, transcriptomic, and proteomic technologies. I enjoy creating new statistical methods and writing efficient bioinformatics tools that help researchers interpret their data. I am particularly interested in developing methods and software for the Bioconductor Project. One key goal of the methods I develop is to identify molecular features, such as genomic coordinates, genes/transcripts, or proteins, that change in accessibility, expression, or abundance between experimental conditions.

Projects

  • Differential transcript usage with limma and edgeR
    • Uncertainty-aware differential transcript usage analysis method with divided counts. The diffSplice function is now a generic function with methods for limma MArrayLM objects and edgeR DGEGLM objects for Bioconductor v3.21 and above.
    • limma and edgeR
    • Workflow
    • Paper
  • Differential transcript expression with edgeR v4
    • Improved differential transcript expression pipeline with Salmon’s Gibbs sampling and the new bias-corrected quasi-likelihood method with adjusted deviances for small counts from edgeR v4.
    • R/Bioconductor
    • Workflow
    • Paper
  • catchSalmon/catchKallisto (within edgeR)
    • Estimation of mapping ambiguity overdispersion from transcript quantification of short read RNA-seq data. It unlocks uncertainty-free differential expression assessment at the transcript-level within edgeR.
    • R/Bioconductor
    • Workflow and User’s Guide
    • Paper
  • epigraHMM
    • A toolkit for the analysis of epigenomic datasets such as ChIP-seq, ATAC-seq, CUT&RUN, and CUT&Tag. It performs differential and consensus peak calling from multi-sample multi-condition datasets.
    • R/Bioconductor
    • Vignette
    • Paper
  • ZIMHMM
    • A consensus peak caller for epigenomic datasets. It implements a fast hidden Markov model with mixed-effects zero-inflated negative binomial emissions using sample-specific random effects.
    • GitHub
    • Paper

Timeline