Data management¶
Tools to process, organize, and query data. The classes are derived from pandas.DataFrame, meaning everything you can do with pandas you can do with MAGINE.
BaseData is the core DataFrame. We provide functions that are commonly used. This class is used by both “Sample” and “EnrichmentResult”.
BaseData¶
- class magine.data.base.BaseData(*args, **kwargs)[source]¶
Bases:
pandas.core.frame.DataFrame
This class derived from pd.DataFrame
- heatmap(subset=None, subset_index=None, convert_to_log=True, y_tick_labels='auto', cluster_row=False, cluster_col=False, cluster_by_set=False, index=None, values=None, columns=None, annotate_sig=True, figsize=(8, 12), div_colors=True, linewidths=0, num_colors=21, sort_row=None, min_sig=0, rank_index=None)[source]¶
Creates heatmap of data, providing pivot and formatting.
- Parameters
- subsetlist or str
Will filter to only contain a provided list. If a str, will filter based on .contains(subset)
- subset_indexstr
Index to for subset list to match against
- convert_to_logbool
Convert values to log2 scale
- y_tick_labelsstr
Column of values, default = ‘auto’
- cluster_rowbool
- cluster_colbool
- cluster_by_setbool
Clusters by gene set, only used in EnrichmentResult derived class
- indexstr
Index of heatmap, will be ‘row’ variables
- valuesstr
Values to display in heatmap
- columnsstr
Value that will be used as columns
- annotate_sigbool
Add ‘+’ annotation to not ‘significant=True’ column
- figsizetuple
Figure size to pass to matplotlib
- div_colorsbool
Use colors that are divergent (red to blue, instead of shades of blue)
- num_colorsint
How many colors to include on color bar
- linewidthsfloat
line width between individual cols and rows
- sort_rowstr
Rank by ‘mean’, ‘max’, ‘min’ or ‘index’
- min_sigint
Minimum number of significant ‘index’ across samples. Can be used to remove rows that are not significant across any sample.
- rank_indexbool
Deprecated, please use sort_row=’index’ to sort by alphabetically
- Returns
- ——-
- matplotlib.figure
- log2_normalize_df(column='fold_change', inplace=False)[source]¶
Convert “fold_change” column to log2.
Does so by taking log2 of all positive values and -log2 of all negative values.
- Parameters
- columnstr
Column to convert
- inplacebool
Where to apply log2 in place or return new dataframe
- pivoter(convert_to_log=False, columns='sample_id', values='fold_change', index=None, fill_value=None, min_sig=0)[source]¶
Pivot data on provided axis.
- Parameters
- convert_to_logbool
Convert values column to log2
- indexstr
Index for pivot table
- columnsstr
Columns to pivot
- valuesstr
Values of pivot table
- fill_valuefloat, optional
Fill pivot table nans with
- min_sigint
Required number of significant terms to keep in a row, default 0
- present_in_all_columns(columns='sample_id', index=None, inplace=False)[source]¶
Require index to be present in all columns
- Parameters
- columnsstr
Columns to consider
- indexstr, list
The column with which to filter by counts
- inplacebool
Filter in place or return a copy of the filtered data
- Returns
- new_dataBaseData
- require_n_sig(columns='sample_id', index=None, n_sig=3, inplace=False, verbose=False)[source]¶
Filter index to have at least “min_terms” significant species.
- Parameters
- columnsstr
Columns to consider
- indexstr, list
The column with which to filter by counts
- n_sigint
Number of terms required to not be filtered
- inplacebool
Filter in place or return a copy of the filtered data
- verbosebool
- Returns
- new_dataBaseData
- property sig¶
terms with significant flag
Species data¶
- class magine.data.experimental_data.Sample(*args, **kwargs)[source]¶
Bases:
magine.data.base.BaseData
Provides tools for subsets of data types
- property by_sample¶
List of significantly flagged species by sample
- property down¶
return down regulated species
- property down_by_sample¶
List of down regulated species by sample
- property exp_methods¶
List of sample_ids in data
- property id_list¶
Set of species identifiers
- property label_list¶
Set of species labels
- plot_all(html_file_name, out_dir='out', plot_type='plotly', run_parallel=False)[source]¶
Creates a plot of all metabolites
- Parameters
- html_file_namestr
filename to save html of all plots
- out_dir: str, path
Directory that will contain all proteins
- plot_typestr
plotly or matplotlib output
- run_parallelbool
Create the plots in parallel
- Returns
- ——-
- plot_histogram(save_name=None, y_range=None, out_dir=None)[source]¶
Plots a histogram of data
- Parameters
- save_name: str
Name of figure
- out_dir: str, path
Path to location to save figure
- y_range: array_like
range of data
- plot_pie_sig_ratio(save_name=None, ax=None, fig=None, figsize=None)[source]¶
- Parameters
- save_namestr
- axmatplotlib.axes, optional
- figmatplotlib.figure
- figsizetuple
Size of figure
- plot_species(species_list=None, subset_index=None, save_name=None, out_dir=None, title=None, plot_type='plotly', image_format='png')[source]¶
Create scatter plot of species list
- Parameters
- species_listlist
list of compounds
- subset_indexlist
Column to filter based on species_list
- save_namestr
Name of html output file
- out_dirstr
Location to place plots
- titlestr
Title for HTML page
- plot_typestr
Type of plot outputs, can be “plotly” or “matplotlib”
- image_formatstr
pdf or png, only used if plot_type=”matplotlib”
- Returns
- matplotlib.Figure or plotly.Figure
- property sample_ids¶
List of sample_ids in data
- subset(species=None, index='identifier', sample_ids=None, exp_methods=None)[source]¶
- Parameters
- specieslist, str
List of species to create subset dataframe from
- indexstr
Index to filter based on provided ‘species’ list
- sample_idsstr, list
List or string to filter sample
- exp_methodsstr, list
List or string to filter sample
- Returns
- magine.data.experimental_data.Species
- property up¶
return up regulated species
- property up_by_sample¶
List of up regulated species by sample
- volcano_by_sample(save_name=None, p_value=0.1, out_dir=None, fold_change_cutoff=1.5, y_range=None, x_range=None, sig_column=False)[source]¶
Creates a figure of subplots of provided experimental method
- Parameters
- save_name: str
name to save figure
- out_dir: str, directory
Location to save figure
- sig_column: bool, optional
If to use significant flags of data
- p_value: float, optional
Criteria for significant
- fold_change_cutoff: float, optional
Criteria for significant
- y_range: array_like
upper and lower bounds of plot in y direction
- x_range: array_like
upper and lower bounds of plot in x direction
- volcano_plot(save_name=None, out_dir=None, sig_column=False, p_value=0.1, fold_change_cutoff=1.5, x_range=None, y_range=None)[source]¶
Create a volcano plot of data
- Parameters
- save_name: str
name to save figure
- out_dir: str, directory
Location to save figure
- sig_column: bool, optional
If to use significant flags of data
- p_value: float, optional
Criteria for significant
- fold_change_cutoff: float, optional
Criteria for significant
- y_range: array_like
upper and lower bounds of plot in y direction
- x_range: array_like
upper and lower bounds of plot in x direction
- Returns
- matplotlib.Figure
- class magine.data.experimental_data.ExperimentalData(data_file)[source]¶
Bases:
object
Manages all experimental data
- property compounds¶
Only compounds in data
- Returns
- Sample
- create_summary_table(sig=False, index='identifier', save_name=None, plot=False, write_latex=False)[source]¶
Creates a summary table of data.
- Parameters
- sig: bool
Flag to summarize significant species only
- save_name: str
Name to save csv and .tex file
- index: str
Index for counts
- plot: bool
If you want to create a plot of the table
- write_latex: bool
Create latex file of table
- Returns
- pandas.DataFrame
- property exp_methods¶
List of source columns
- property genes¶
All data tagged with gene
Includes protein and RNA.
- property proteins¶
Protein level data
Tagged with “gene” identifier that is not RNA
- property rna¶
RNA level data
Tagged with “RNA”
- property sample_ids¶
List of sample_ids
- property species¶
Returns data in Sample format
- Returns
- Sample
- subset(species, index='identifier')[source]¶
- Parameters
- specieslist, str
List of species to create subset dataframe from
- indexstr
Index to filter based on provided ‘species’ list
- Returns
- magine.data.experimental_data.Species
- volcano_analysis(out_dir, use_sig_flag=True, p_value=0.1, fold_change_cutoff=1.5)[source]¶
Creates a volcano plot for each experimental method
- Parameters
- out_dir: str, path
Path to where the output figures will be saved
- use_sig_flag: bool
Use significant flag of data
- p_value: float, optional
p value criteria for significant Will not be used if use_sig_flag
- fold_change_cutoff: float, optional
fold change criteria for significant Will not be used if use_sig_flag