Data management

Tools to process, organize, and query data. The classes are derived from pandas.DataFrame, meaning everything you can do with pandas you can do with MAGINE.

BaseData is the core DataFrame. We provide functions that are commonly used. This class is used by both “Sample” and “EnrichmentResult”.

BaseData

class magine.data.base.BaseData(*args, **kwargs)[source]

Bases: pandas.core.frame.DataFrame

This class derived from pd.DataFrame

heatmap(subset=None, subset_index=None, convert_to_log=True, y_tick_labels='auto', cluster_row=False, cluster_col=False, cluster_by_set=False, index=None, values=None, columns=None, annotate_sig=True, figsize=(8, 12), div_colors=True, linewidths=0, num_colors=21, sort_row=None, min_sig=0, rank_index=None)[source]

Creates heatmap of data, providing pivot and formatting.

Parameters
subsetlist or str

Will filter to only contain a provided list. If a str, will filter based on .contains(subset)

subset_indexstr

Index to for subset list to match against

convert_to_logbool

Convert values to log2 scale

y_tick_labelsstr

Column of values, default = ‘auto’

cluster_rowbool
cluster_colbool
cluster_by_setbool

Clusters by gene set, only used in EnrichmentResult derived class

indexstr

Index of heatmap, will be ‘row’ variables

valuesstr

Values to display in heatmap

columnsstr

Value that will be used as columns

annotate_sigbool

Add ‘+’ annotation to not ‘significant=True’ column

figsizetuple

Figure size to pass to matplotlib

div_colorsbool

Use colors that are divergent (red to blue, instead of shades of blue)

num_colorsint

How many colors to include on color bar

linewidthsfloat

line width between individual cols and rows

sort_rowstr

Rank by ‘mean’, ‘max’, ‘min’ or ‘index’

min_sigint

Minimum number of significant ‘index’ across samples. Can be used to remove rows that are not significant across any sample.

rank_indexbool

Deprecated, please use sort_row=’index’ to sort by alphabetically

Returns
——-
matplotlib.figure
log2_normalize_df(column='fold_change', inplace=False)[source]

Convert “fold_change” column to log2.

Does so by taking log2 of all positive values and -log2 of all negative values.

Parameters
columnstr

Column to convert

inplacebool

Where to apply log2 in place or return new dataframe

pivoter(convert_to_log=False, columns='sample_id', values='fold_change', index=None, fill_value=None, min_sig=0)[source]

Pivot data on provided axis.

Parameters
convert_to_logbool

Convert values column to log2

indexstr

Index for pivot table

columnsstr

Columns to pivot

valuesstr

Values of pivot table

fill_valuefloat, optional

Fill pivot table nans with

min_sigint

Required number of significant terms to keep in a row, default 0

present_in_all_columns(columns='sample_id', index=None, inplace=False)[source]

Require index to be present in all columns

Parameters
columnsstr

Columns to consider

indexstr, list

The column with which to filter by counts

inplacebool

Filter in place or return a copy of the filtered data

Returns
new_dataBaseData
require_n_sig(columns='sample_id', index=None, n_sig=3, inplace=False, verbose=False)[source]

Filter index to have at least “min_terms” significant species.

Parameters
columnsstr

Columns to consider

indexstr, list

The column with which to filter by counts

n_sigint

Number of terms required to not be filtered

inplacebool

Filter in place or return a copy of the filtered data

verbosebool
Returns
new_dataBaseData
property sig

terms with significant flag

Species data

class magine.data.experimental_data.Sample(*args, **kwargs)[source]

Bases: magine.data.base.BaseData

Provides tools for subsets of data types

property by_sample

List of significantly flagged species by sample

property down

return down regulated species

property down_by_sample

List of down regulated species by sample

property exp_methods

List of sample_ids in data

property id_list

Set of species identifiers

property label_list

Set of species labels

plot_all(html_file_name, out_dir='out', plot_type='plotly', run_parallel=False)[source]

Creates a plot of all metabolites

Parameters
html_file_namestr

filename to save html of all plots

out_dir: str, path

Directory that will contain all proteins

plot_typestr

plotly or matplotlib output

run_parallelbool

Create the plots in parallel

Returns
——-
plot_histogram(save_name=None, y_range=None, out_dir=None)[source]

Plots a histogram of data

Parameters
save_name: str

Name of figure

out_dir: str, path

Path to location to save figure

y_range: array_like

range of data

plot_pie_sig_ratio(save_name=None, ax=None, fig=None, figsize=None)[source]
Parameters
save_namestr
axmatplotlib.axes, optional
figmatplotlib.figure
figsizetuple

Size of figure

plot_species(species_list=None, subset_index=None, save_name=None, out_dir=None, title=None, plot_type='plotly', image_format='png')[source]

Create scatter plot of species list

Parameters
species_listlist

list of compounds

subset_indexlist

Column to filter based on species_list

save_namestr

Name of html output file

out_dirstr

Location to place plots

titlestr

Title for HTML page

plot_typestr

Type of plot outputs, can be “plotly” or “matplotlib”

image_formatstr

pdf or png, only used if plot_type=”matplotlib”

Returns
matplotlib.Figure or plotly.Figure
property sample_ids

List of sample_ids in data

subset(species=None, index='identifier', sample_ids=None, exp_methods=None)[source]
Parameters
specieslist, str

List of species to create subset dataframe from

indexstr

Index to filter based on provided ‘species’ list

sample_idsstr, list

List or string to filter sample

exp_methodsstr, list

List or string to filter sample

Returns
magine.data.experimental_data.Species
property up

return up regulated species

property up_by_sample

List of up regulated species by sample

volcano_by_sample(save_name=None, p_value=0.1, out_dir=None, fold_change_cutoff=1.5, y_range=None, x_range=None, sig_column=False)[source]

Creates a figure of subplots of provided experimental method

Parameters
save_name: str

name to save figure

out_dir: str, directory

Location to save figure

sig_column: bool, optional

If to use significant flags of data

p_value: float, optional

Criteria for significant

fold_change_cutoff: float, optional

Criteria for significant

y_range: array_like

upper and lower bounds of plot in y direction

x_range: array_like

upper and lower bounds of plot in x direction

volcano_plot(save_name=None, out_dir=None, sig_column=False, p_value=0.1, fold_change_cutoff=1.5, x_range=None, y_range=None)[source]

Create a volcano plot of data

Parameters
save_name: str

name to save figure

out_dir: str, directory

Location to save figure

sig_column: bool, optional

If to use significant flags of data

p_value: float, optional

Criteria for significant

fold_change_cutoff: float, optional

Criteria for significant

y_range: array_like

upper and lower bounds of plot in y direction

x_range: array_like

upper and lower bounds of plot in x direction

Returns
matplotlib.Figure
class magine.data.experimental_data.ExperimentalData(data_file)[source]

Bases: object

Manages all experimental data

property compounds

Only compounds in data

Returns
Sample
create_summary_table(sig=False, index='identifier', save_name=None, plot=False, write_latex=False)[source]

Creates a summary table of data.

Parameters
sig: bool

Flag to summarize significant species only

save_name: str

Name to save csv and .tex file

index: str

Index for counts

plot: bool

If you want to create a plot of the table

write_latex: bool

Create latex file of table

Returns
pandas.DataFrame
property exp_methods

List of source columns

property genes

All data tagged with gene

Includes protein and RNA.

get_measured_by_datatype()[source]

Returns dict of species per data type

Returns
dict
property proteins

Protein level data

Tagged with “gene” identifier that is not RNA

property rna

RNA level data

Tagged with “RNA”

property sample_ids

List of sample_ids

property species

Returns data in Sample format

Returns
Sample
subset(species, index='identifier')[source]
Parameters
specieslist, str

List of species to create subset dataframe from

indexstr

Index to filter based on provided ‘species’ list

Returns
magine.data.experimental_data.Species
volcano_analysis(out_dir, use_sig_flag=True, p_value=0.1, fold_change_cutoff=1.5)[source]

Creates a volcano plot for each experimental method

Parameters
out_dir: str, path

Path to where the output figures will be saved

use_sig_flag: bool

Use significant flag of data

p_value: float, optional

p value criteria for significant Will not be used if use_sig_flag

fold_change_cutoff: float, optional

fold change criteria for significant Will not be used if use_sig_flag