Plot feature values by run order for reference and comparison data
plot_omics_distributions.RdVisualise omics feature values across run order for a set of target features, highlighting QC versus non-QC samples. The function is designed to assess technical variation associated with run order and to evaluate the influence of stratification factors such as plate or batch on feature measurements. It generates scatter plots of feature values over run order, optionally colouring points by plate and marking batch boundaries with vertical dotted lines.
If df_comp is supplied, the function returns a side-by-side comparison
of the reference and comparison data with a shared legend. Otherwise, it
returns only the reference plot. This feature can be used to assess the
effect of post-processing on the data (for example, normalisation).
Usage
plot_omics_distributions(
df,
target_cols,
run_order,
is_qc = NULL,
batch = NULL,
plate = NULL,
title = NULL,
df_comp = NULL,
title_comp = NULL,
point_size = 1
)Arguments
- df
A data frame containing the reference data. It must include the columns referenced by
target_colsand the column named inrun_order. It may also include columns named inbatchandplate.- target_cols
Character vector giving the feature columns to plot. These columns are pivoted to long format and faceted with one panel per feature.
- run_order
Character scalar giving the name of the column encoding run order. This column is coerced to numeric and used as the x-axis.
- is_qc
Optional character scalar specifying the QC indicator column. If
NULL, all samples are treated as non-QC.- batch
Optional character scalar giving the name of the column encoding batch membership. If provided, vertical dotted lines are drawn at the end of each batch. If
NULL, all samples are treated as a single batch and no boundary lines are drawn. Default isNULL.- plate
Optional character scalar giving the name of the column encoding plate membership. If provided, points are coloured by plate. If
NULL, all points are assigned to a single plate level"all". Default isNULL.- title
Optional title for the reference plot. If
df_compis notNULLandtitle_refisNULL, the default title is"Reference". Default isNULL.- df_comp
Optional data frame containing comparison data, for example normalised values. If provided, it must include the columns referenced by
target_colsand the column named inrun_order. It may also include columns named inbatchandplate. Default isNULL.- title_comp
Optional title for the comparison plot. If
df_compis notNULLandtitle_compisNULL, the default title is"Comparison". Default isNULL.- point_size
Numeric scalar giving the base point size for non-QC samples. QC points are drawn one unit larger. Default is
1.
Value
Invisibly returns a plot object:
If
df_compisNULL, a singleggplot2plot for the reference data.Otherwise, a combined
patchworkplot with the reference and comparison plots shown side by side and a shared legend.
Details
Internally, the function:
Selects the columns
{batch, plate, run_order} \eqn{\cup} target_cols, usingdplyr::any_of()for optional columns anddplyr::all_of()for required feature columns.Adds an
is_qcflag and pivots the data to long format with columnsfeatureandvalue.Coerces the run-order column to numeric and sorts rows by run order.
Creates a scatter plot of
valueversusrun_order, faceted by feature withscales = "free_y".Maps point colour to plate, shape to QC status, alpha to QC status, and size to QC status.
Uses a rainbow palette when multiple plate levels are present and
"grey40"when only one plate level is present.Optionally adds vertical dotted lines at the maximum run order within each batch.
If
df_compis provided, combines the reference and comparison plots withpatchwork::wrap_plots()and collects guides into a shared legend.
QC samples are shown as filled circles and non-QC samples as triangles. Non-QC samples are also plotted with lower alpha.