Data management¶

Tools to process, organize, and query data. The classes are derived from pandas.DataFrame, meaning everything you can do with pandas you can do with MAGINE.

BaseData is the core DataFrame. We provide functions that are commonly used. This class is used by both “Sample” and “EnrichmentResult”.

BaseData¶

class magine.data.base.BaseData(*args, **kwargs)[source]¶

Bases: pandas.core.frame.DataFrame

This class derived from pd.DataFrame

heatmap(subset=None, subset_index=None, convert_to_log=True, y_tick_labels='auto', cluster_row=False, cluster_col=False, cluster_by_set=False, index=None, values=None, columns=None, annotate_sig=True, figsize=(8, 12), div_colors=True, linewidths=0, num_colors=21, sort_row=None, min_sig=0, rank_index=None)[source]¶

Creates heatmap of data, providing pivot and formatting.

Parameters

subsetlist or str: Will filter to only contain a provided list. If a str, will filter based on .contains(subset)
subset_indexstr: Index to for subset list to match against
convert_to_logbool: Convert values to log2 scale
y_tick_labelsstr: Column of values, default = ‘auto’
cluster_rowbool
cluster_colbool
cluster_by_setbool: Clusters by gene set, only used in EnrichmentResult derived class
indexstr: Index of heatmap, will be ‘row’ variables
valuesstr: Values to display in heatmap
columnsstr: Value that will be used as columns
annotate_sigbool: Add ‘+’ annotation to not ‘significant=True’ column
figsizetuple: Figure size to pass to matplotlib
div_colorsbool: Use colors that are divergent (red to blue, instead of shades of blue)
num_colorsint: How many colors to include on color bar
linewidthsfloat: line width between individual cols and rows
sort_rowstr: Rank by ‘mean’, ‘max’, ‘min’ or ‘index’
min_sigint: Minimum number of significant ‘index’ across samples. Can be used to remove rows that are not significant across any sample.
rank_indexbool: Deprecated, please use sort_row=’index’ to sort by alphabetically
Returns
——-
matplotlib.figure

log2_normalize_df(column='fold_change', inplace=False)[source]¶

Convert “fold_change” column to log2.

Does so by taking log2 of all positive values and -log2 of all negative values.

Parameters

columnstr: Column to convert
inplacebool: Where to apply log2 in place or return new dataframe

pivoter(convert_to_log=False, columns='sample_id', values='fold_change', index=None, fill_value=None, min_sig=0)[source]¶

Pivot data on provided axis.

Parameters

convert_to_logbool: Convert values column to log2
indexstr: Index for pivot table
columnsstr: Columns to pivot
valuesstr: Values of pivot table
fill_valuefloat, optional: Fill pivot table nans with
min_sigint: Required number of significant terms to keep in a row, default 0

present_in_all_columns(columns='sample_id', index=None, inplace=False)[source]¶

Require index to be present in all columns

Parameters

columnsstr: Columns to consider
indexstr, list: The column with which to filter by counts
inplacebool: Filter in place or return a copy of the filtered data

Returns

new_dataBaseData

require_n_sig(columns='sample_id', index=None, n_sig=3, inplace=False, verbose=False)[source]¶

Filter index to have at least “min_terms” significant species.

Parameters

columnsstr: Columns to consider
indexstr, list: The column with which to filter by counts
n_sigint: Number of terms required to not be filtered
inplacebool: Filter in place or return a copy of the filtered data
verbosebool

Returns

new_dataBaseData

property sig¶: terms with significant flag

Species data¶

class magine.data.experimental_data.Sample(*args, **kwargs)[source]¶

Bases: magine.data.base.BaseData

Provides tools for subsets of data types

property by_sample¶: List of significantly flagged species by sample

property down¶: return down regulated species

property down_by_sample¶: List of down regulated species by sample

property exp_methods¶: List of sample_ids in data

property id_list¶: Set of species identifiers

property label_list¶: Set of species labels

plot_all(html_file_name, out_dir='out', plot_type='plotly', run_parallel=False)[source]¶

Creates a plot of all metabolites

Parameters

html_file_namestr: filename to save html of all plots
out_dir: str, path: Directory that will contain all proteins
plot_typestr: plotly or matplotlib output
run_parallelbool: Create the plots in parallel
Returns
——-

plot_histogram(save_name=None, y_range=None, out_dir=None)[source]¶

Plots a histogram of data

Parameters

save_name: str: Name of figure
out_dir: str, path: Path to location to save figure
y_range: array_like: range of data

plot_pie_sig_ratio(save_name=None, ax=None, fig=None, figsize=None)[source]¶

Parameters

save_namestr
axmatplotlib.axes, optional
figmatplotlib.figure
figsizetuple: Size of figure

plot_species(species_list=None, subset_index=None, save_name=None, out_dir=None, title=None, plot_type='plotly', image_format='png')[source]¶

Create scatter plot of species list

Parameters

species_listlist: list of compounds
subset_indexlist: Column to filter based on species_list
save_namestr: Name of html output file
out_dirstr: Location to place plots
titlestr: Title for HTML page
plot_typestr: Type of plot outputs, can be “plotly” or “matplotlib”
image_formatstr: pdf or png, only used if plot_type=”matplotlib”

Returns

matplotlib.Figure or plotly.Figure

property sample_ids¶: List of sample_ids in data

subset(species=None, index='identifier', sample_ids=None, exp_methods=None)[source]¶

Parameters

specieslist, str: List of species to create subset dataframe from
indexstr: Index to filter based on provided ‘species’ list
sample_idsstr, list: List or string to filter sample
exp_methodsstr, list: List or string to filter sample

Returns

magine.data.experimental_data.Species

property up¶: return up regulated species

property up_by_sample¶: List of up regulated species by sample

volcano_by_sample(save_name=None, p_value=0.1, out_dir=None, fold_change_cutoff=1.5, y_range=None, x_range=None, sig_column=False)[source]¶

Creates a figure of subplots of provided experimental method

Parameters

save_name: str: name to save figure
out_dir: str, directory: Location to save figure
sig_column: bool, optional: If to use significant flags of data
p_value: float, optional: Criteria for significant
fold_change_cutoff: float, optional: Criteria for significant
y_range: array_like: upper and lower bounds of plot in y direction
x_range: array_like: upper and lower bounds of plot in x direction

volcano_plot(save_name=None, out_dir=None, sig_column=False, p_value=0.1, fold_change_cutoff=1.5, x_range=None, y_range=None)[source]¶

Create a volcano plot of data

Parameters

save_name: str: name to save figure
out_dir: str, directory: Location to save figure
sig_column: bool, optional: If to use significant flags of data
p_value: float, optional: Criteria for significant
fold_change_cutoff: float, optional: Criteria for significant
y_range: array_like: upper and lower bounds of plot in y direction
x_range: array_like: upper and lower bounds of plot in x direction

Returns

matplotlib.Figure

class magine.data.experimental_data.ExperimentalData(data_file)[source]¶

Bases: object

Manages all experimental data

property compounds¶

Only compounds in data

Returns

Sample

create_summary_table(sig=False, index='identifier', save_name=None, plot=False, write_latex=False)[source]¶

Creates a summary table of data.

Parameters

sig: bool: Flag to summarize significant species only
save_name: str: Name to save csv and .tex file
index: str: Index for counts
plot: bool: If you want to create a plot of the table
write_latex: bool: Create latex file of table

Returns

pandas.DataFrame

property exp_methods¶: List of source columns

property genes¶

All data tagged with gene

Includes protein and RNA.

get_measured_by_datatype()[source]¶

Returns dict of species per data type

Returns

dict

property proteins¶

Protein level data

Tagged with “gene” identifier that is not RNA

property rna¶

RNA level data

Tagged with “RNA”

property sample_ids¶: List of sample_ids

property species¶

Returns data in Sample format

Returns

Sample

subset(species, index='identifier')[source]¶

Parameters

specieslist, str: List of species to create subset dataframe from
indexstr: Index to filter based on provided ‘species’ list

Returns

magine.data.experimental_data.Species

volcano_analysis(out_dir, use_sig_flag=True, p_value=0.1, fold_change_cutoff=1.5)[source]¶

Creates a volcano plot for each experimental method

Parameters

out_dir: str, path: Path to where the output figures will be saved
use_sig_flag: bool: Use significant flag of data
p_value: float, optional: p value criteria for significant Will not be used if use_sig_flag
fold_change_cutoff: float, optional: fold change criteria for significant Will not be used if use_sig_flag