Use cases¶

Itroduction¶

The pybioum llibrary provides a list of methods to interact with BioUML server from within python. BioUML is an open source integrated Java platform for analysis of data from omics sciences research and other advanced computational biology, for building the virtual cell and the virtual physiological human. It spans a comprehensive range of capabilities, including access to databases with experimental data, tools for formalized description of biological systems structure and functioning, as well as tools for their visualization, simulation, parametersfitting and analyses.

Getting started¶

ConnectingtoBioUMLserver¶

The first thing youneed to do is load thepackage and log into the BioUML server.As an example we will connect to the free public BioUML server.

Library instalation throuth pip:

pip install pybiouml

Nex step will be library importing and making a object of your future connaction:

[1]:

from pybiouml import pybiouml

After importing library we create an object using PyBiouml class in library:

[2]:

my_work = pybiouml.PyBiouml()

Now we are ready to work. At first, we need to loging into server. It can be done using login method of early created class my_work.

The login also accepts user and password, but we leave them empty in the example above for anonymous login. Alternatively you can install BioUML on your local computer and connect to it in the same way:

my_work.login('localhost:8080')

See BioUML installation for details on BioUML server installation.

[3]:

my_work.login(url='https://ict.biouml.org')

Querying BioUML repository¶

The BioUML repository (or simply repository) is the central BioUML data storage place. Basically, all the data you work with in BioUML is stored in the repository. The repository has a hierarchical structure similar to file systems. On the top level the repository consists of several root folders. The most common ones are:

databases contains preinstalled or user-defined modules.
data contains user projects and public examples.

The ls lists the contents of given folder in repository. The list of databases available in BioUML server:

[5]:

import requests
my_work.ls('databases')

[5]:

	name	hasChildren	class
0	Biomodels	True	0
1	EnsemblArabidopsisThaliana91	True	1
2	EnsemblFruitfly91	True	1
3	EnsemblHuman85_38	True	1
4	EnsemblMouse81_38	True	1
5	EnsemblNematoda91	True	1
6	EnsemblRat91	True	1
7	EnsemblSaccharomycesCerevisiae91	True	1
8	EnsemblSchizosaccharomycesPombe91	True	1
9	EnsemblZebrafish92	True	1
10	GTRD	True	1
11	HOCOMOCO v11	True	0
12	PantherDB 14	True	0
13	Reactome Icons	True	2
14	Reactome63	True	1
15	Tests SBML 3.3.0	True	0
16	Tests Stochastic	True	0
17	Virtual Cell	True	0
18	Virtual Human	True	0

The list of data elements available in BioUML examples folder:

[6]:

my_work.ls('data/Examples/Optimization/Data/Experiments')

[6]:

	name	hasChildren
0	exp_data_1	False
1	exp_data_2	False
2	exp_data_3	False

The pybiouml get method fetches a table from BioUML repository as pandas.DataFrame:

[7]:

exp_1 = my_work.get('data/Examples/Optimization/Data/Experiments/exp_data_1')
exp_1.head()

[7]:

	time	p43p41	pro8	casp8
0	0	0.057725	59.963164	0
1	10	0.268144	57.564637	0.041075
2	20	4.760481	58.589814	0.316117
3	30	8.251935	59.421561	1.397356
4	45	16.144483	48.189751	3.520371

This function allows to fetch not only true BioUML tables,but any data elements which have tabular representation, including profiles, user uploaded tracks and soon.

To store pandas.DataFrame as a table into BioUML repository use put method:

[8]:

exp_1['sum_column'] = exp_1[['pro8', 'casp8']].sum(axis=1)
exp_1

[8]:

	time	p43p41	pro8	casp8	sum_column
0	0	0.057725	59.963164	0	59.963164
1	10	0.268144	57.564637	0.041075	57.605712
2	20	4.760481	58.589814	0.316117	58.905930
3	30	8.251935	59.421561	1.397356	60.818917
4	45	16.144483	48.189751	3.520371	51.710122
5	60	17.020606	38.950266	3.947229	42.897495
6	90	15.269292	23.501692	4.871417	28.373108
7	120	12.53013	13.127419	4.87786	18.005280
8	150	10.334704	10.703102	4.228311	14.931413

[14]:

my_work.put('data/Collaboration/Demo/Data/pybiouml_test/exp_1_pybiouml', exp_1)

[15]:

my_work.ls('data/Collaboration/Demo/Data/pybiouml_test')

[15]:

	name	hasChildren	class
0	exp_1_pybiouml	False	0

Using BioUML analyses¶

BioUML provides a set of analyses organized in groups.The list of analyses available in the currents server can be fetched with analysis_list method.

[16]:

a_l = my_work.analysis_list()
a_l

[16]:

	Group	Name
0	ChIP-seq	ChIP-seq Quality control analysis
1	ChIP-seq	ChIP-seq peak profile
2	ChIP-seq	Quality control analysis
3	ChIP-seq	Report generator for quality control analysis
4	ChIP-seq	Run MACS 1_3_7 on ChiP-Seq
...	...	...
229	Workflow utils	Check Workflow consistency
230	Workflow utils	Copy data element
231	Workflow utils	Copy folder
232	Workflow utils	Create folder
233	Workflow utils	Run a Workflow as Analysis

234 rows × 2 columns

[17]:

a_l['Group'].unique()

[17]:

array(['ChIP-seq', 'Composite module analyses',
       'Differential algebraic equations', 'GATK', 'GTRD',
       'Gene set analysis', 'Import', 'Match sites and genes', 'MicroRNA',
       'Molecular networks', 'Motif discovery', 'Mutations',
       'NGS alignment', 'NGS color-space', 'NGS utils',
       'Operations with genomic tracks', 'Parameter fitting',
       'Plots and charts', 'RNA-seq', 'Statistics',
       'TF binding site search', 'Table manipulation', 'Workflow utils'],
      dtype=object)

[47]:

a_l[a_l['Group'] == 'Table manipulation']

[47]:

	Group	Name
213	Table manipulation	Add calculated column
214	Table manipulation	Annotate table
215	Table manipulation	Convert table
216	Table manipulation	Convert table via homology
217	Table manipulation	Filter duplicate rows
218	Table manipulation	Filter table
219	Table manipulation	Group table rows
220	Table manipulation	Join several tables
221	Table manipulation	Join two tables
222	Table manipulation	Merge table columns
223	Table manipulation	Select random rows
224	Table manipulation	Select table columns
225	Table manipulation	Select top rows
226	Table manipulation	Super annotate table
227	Table manipulation	Transform table

Each biouml analysis has a set of parameters, analysis_parameters returns a pandas.DataFrame with row names corresponding to parameter names and one column ‘description’.

[49]:

my_work.analysis_parameters('Filter table')

[49]:

	Name	Description
0	inputPath	Table to filter
1	filterExpression	Expression in JavaScript like 'ColumnName1 > 5...
2	filteringMode	Which rows to select
3	outputPath	Path to the filtered table

The analysis method launches analysis with given parameters.

[19]:

my_work.analysis('Filter table',
                 parameters={
                     'inputPath': 'data/Examples/Optimization/Data/Experiments/exp_data_1',
                     'filterExpression': 'time < 40',
                     'outputPath': 'data/Collaboration/Demo/Data/pybiouml_test/exp_data_1 filtered'
                 }
                )

INFO - Analysis 'Filter table' added to queue
INFO - Analysis 'Filter table' started
INFO - Filtering...



INFO - Writing result...

INFO - Analysis 'Filter table' finished (3.968 s)

RJOB202202131652562

Importing files to and from BioUML¶

As described previously, pandas.DataFrame can be fetched from and stored to BioUML repository using pybiouml get and put methods. In addition, data can be imported from files and exported to files in various formats. The list of importers can be obtained with importers method.

[58]:

my_work.importers()[:10]

[58]:

['BioUML format(*.dml)',
 'BioUML Simulation result',
 'ZIP-archive (*.zip)',
 'Generic file',
 'Image file (*.png, *jpeg, *.gif etc)',
 'Text file (*.txt)',
 'HTML file (*.html, *.htm)',
 'SBML',
 'SBML(CellDesigner)',
 'BioPAX file (*.owl, *.xml)']

As an example we will import fasta file to BioUML. This fasta file can be downloaded from our github

[20]:

fasta = 'hiv1.fna'
out = 'data/Collaboration/Demo/Data/pybiouml_test'
my_work.to_import(fasta, out, importer='Fasta format (*.fasta)')



data/Collaboration/Demo/Data/pybiouml_test/hiv1

[20]:

'data/Collaboration/Demo/Data/pybiouml_test/hiv1'

[21]:

my_work.ls(out)

[21]:

	name	hasChildren	class
0	exp_1_pybiouml	False	0
1	exp_data_1 filtered	False	0
2	hiv1	True	1

Similarly, we can use export method to export data from BioUML repository.

[22]:

my_work.exporters()

[22]:

['JPEG file (*.jpg)',
 'Bitmap file(*.bmp)',
 'Portable Network Graphics (*.png)',
 'BioUML format(*.dml)',
 'BioUML state (*.xml)',
 'Pair graph file(*.txt)',
 'Archive containing exported elements (*.zip)',
 'Generic file',
 'Zipped HTML file',
 'SBML',
 'BioPAX (*.owl)',
 'FASTA sequences (*.fasta)',
 'BED format (*.bed)',
 'Interval format (*.interval)',
 'General Feature Format (*.gff)',
 'Gene Transfer Format (*.gtf)',
 'Match format (*.match)',
 'Variant Call Format (*.vcf)',
 'Wiggle format (*.wig)',
 'SAM or BAM alignment file (*.sam, *.bam)',
 'ZHTML document (*.zhtml)',
 'SDF file (*.sdf)',
 'GraphML(*.graphml)',
 'Scalable Vector Graphics(*.svg)',
 'SBGN-ML',
 'COMBINE archive (*.omex)',
 'BioNetGen language format (*.bngl)',
 'Cytoscape (*.cx)',
 'Antimony',
 'Tab-separated text (*.txt)',
 'Comma-separated values (*.csv)',
 'HTML document (*.html)']

[24]:

my_work.export('data/Collaboration/Demo/Data/pybiouml_test/hiv1.fa',
               exporter='Fasta format (*.fasta)',
               target_file='downloaded_hiv1.fa'
              )

[1]:

import os
os.listdir()

[1]:

['.ipynb_checkpoints', 'downloaded_hiv1.fa', 'hiv1.fna', 'Use_cases.ipynb']

[ ]: