Modules Reference

dimcli.login

Dimcli utilities for logging in/out of the Dimensions API. NOTE: these functions are attached to the top level dimcli module. So you can load them as follows:

>>> import dimcli
>>> dimcli.login()
dimcli.__init__.login(username='', password='', endpoint='', instance='', key='', verify_ssl=True, verbose=True)[source]

Login into the Dimensions API and store the query token in memory.

Two cases, with a few defaults:

  • If credentials are provided, the login is performed using those credentials.
  • If credentials are not passed, login is attempted using the local dsl.ini credentials file.
    • If neither instance nor endpoint are provided, instance defaults to ‘live’.

    • If an endpoint url is provided, the first matching directive in the credentials file is used.

Parameters
  • username (str, optional) – The API username

  • password (str, optional) – The API password

  • endpoint (str, optional) – The API endpoint - default is “https://app.dimensions.ai

  • instance (str, optional) – The instance name, from the local dsl.ini credentials file. Default: ‘live’

  • key (str, optional) – The API key (available to some users instead of username/password)

  • verify_ssl (bool, optional) – Verify SSL certificates for HTTPS requests. Default: True.

  • verbose (bool, optional) – Verbose mode. Default: True.

Notes

The endpoint value can either be simply the Dimensions server hostname or the full API endpoint path. All the options below are valid endpoints:

  • https://app.dimensions.ai

  • https://app.dimensions.ai/api/dsl/v1

  • https://app.dimensions.ai/api/dsl/v2

About SSL verification:

Dimcli internally uses the Requests library, which verifies SSL certificates for HTTPS requests, just like a web browser. For some users, it is necessary to turn off SSL verification in order to connect to the API. This can be achieved by passing verify_ssl=False at login time. All subsequent API queries will not use SSL verification. NOTE This setting can also be added to the dsl.ini file with the following line: verify_ssl=false.

Example

If you have already set up the credentials file (see above), no need to pass log in details

>>> dimcli.login()

Otherwise you can authenticate by passing your login details as arguments

>>> dimcli.login(key="my-secret-key", endpoint="https://your-url.dimensions.ai")

You can specify endpoint, which by default is set to “https://app.dimensions.ai

>>> dimcli.login(key="my-secret-key", endpoint="https://nannies-research.dimensions.ai")

Legacy authentication mechanisms with username/password are also supported

>>> dimcli.login(username="mary.poppins", password="chimneysweeper", endpoint="https://nannies-research.dimensions.ai")
dimcli.__init__.login_status()[source]

Utility to check whether we are logged in or not

Returns

True if logged in, otherwise False.

Return type

bool

Example

>>> dimcli.login_status()
False
dimcli.__init__.logout()[source]

Reset the connection to the Dimensions API.

This allows to create a new connection subsequently, eg to a different endpoint.

Example

>>> dimcli.logout()

dimcli.core.api

Dimcli objects for querying the Dimensions API. NOTE: these objects are attached to the top level dimcli module. So you can load them as follows:

>>> import dimcli
>>> dsl = dimcli.Dsl()
class dimcli.core.api.Dsl(show_results=False, verbose=True, auth_session=False)[source]

Bases: object

The Dsl object is the main interface for interacting with the Dimensions API.

Parameters
  • show_results (bool, default=False) – Set a global setting that determines whether query JSON results get printed out. Note that in Jupyter environments this is not needed, because iPython rich widgets are used by default.

  • auth_session (APISession, default=False) – Set an authenticated session object that should be used for querying. Used only in special situations, as an alternative to the dimcli.login() utility method.

  • verbose (bool, default=True) – Verbose mode.

Example

>>> import dimcli
>>> dimcli.login()
>>> dsl = dimcli.Dsl()
>>> dsl.query("""search grants for "graphene" return researchers""")
<dimcli.dimensions.DslDataset object>
>>> _.json
>>> {'researchers': [{'id': 'ur.01332073522.49',
        'count': 75,
        'last_name': 'White',
        'first_name': 'Nicholas J'},
    "... JSON data continues ... "

In some special situations, you’d want to query two separate Dimensions servers in parallel. To that end, it is possible to pass an APISession instance to the Dsl() constructor using the auth_session parameter, IE:

>>> import dimcli
>>> from dimcli.core.auth import APISession
# set up first authentication backend
>>> mysession1 = APISession()
>>> mysession1.login(instance="app.dimensions.ai")
>>> d1 = Dsl(auth_session=mysession1)
>>> d1.query("search publications return research_orgs")
# set up second authentication backend
>>> mysession2 = APISession()
>>> mysession2.login(instance="another-app.dimensions.ai")
>>> d2 = Dsl(auth_session=mysession2)
>>> d2.query("search publications return research_orgs")
query(q, show_results=None, retry=0, verbose=None)[source]

Execute a single DSL query.

This method handles the query token from the API and regenerates it if it’s expired. If the API throws a ‘Too Many Requests for the Server’ error, the method sleeps 30 seconds before retrying.

Parameters
  • show_results (bool, default=None) – Setting that determines whether the query JSON results should be printed out. If None, it inherits from the Dsl global setting. Note that in Jupyter environments this is not needed, because iPython rich widgets are used by default.

  • retry (int, default=0) – Number of times to retry the query if it fails.

  • verbose (bool, default=None) – Verbose mode. If None, it inherits from the Dsl global setting.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

Example

>>> dsl = dimcli.Dsl()
>>> dsl.query("search grants where start_year=2020 return grants")
<dimcli.dimensions.DslDataset object>
query_iterative(q, show_results=None, limit=1000, skip=0, pause=1.5, force=False, maxlimit=0, verbose=None, _tot_count_prev_query=0, _warnings_tot=None)[source]

Runs a DSL query and then keep querying until all matching records have been extracted.

The API returns a maximum of 1000 records per call. If a DSL query results in more than 1000 matches, it is possible to use pagination to get more results, up to 50k.

Iterative querying works by automatically paginating through all records available for a result set. The original query gets turned into a loop that uses the limit / skip operators until all the results available have been extracted.

NOTE If any of the iterative queries produce warning messages, these are aggregated and added to the `_warnings`section of the output data.

Parameters
  • q (str) – The DSL query. Important: pagination keywords eg limit / skip should be omitted.

  • show_results (bool, default=True) – Determines whether the final results are rendered via the iPython display widget (for Jupyter notebooks).

  • limit (int, default=1000) – How many records to extract per iteration. Defaults to 1000.

  • skip (int, default=0) – Offset for first iteration. Defaults to 0. After the first iteration, this value is calculated dynamically.

  • pause (float, default=1.5s) – How much time to pause after each iterarion, expressed in seconds. Defaults to 1.5. Note: each iteration gets timed, so the pause time is used only when the query time is more than 2s.

  • force (bool, default=False) – Continue the extraction even if one of the iterations fails due to an error.

  • maxlimit (int, default=0) – The maximum number of records to extract in total. If 0, all available records are extracted, up to the API upper limit of 50k records per query.

  • verbose (bool, default=False) – Verbose mode.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

Example

>>> dsl = dimcli.Dsl()
>>> dsl.query_iterative("""search grants where category_for.name="0206 Quantum Physics" return grants""")
Starting iteration with limit=1000 skip=0 ...
0-1000 / 8163 (4.062144994735718s)
1000-2000 / 8163 (1.5146172046661377s)
2000-3000 / 8163 (1.7225260734558105s)
3000-4000 / 8163 (1.575329065322876s)
4000-5000 / 8163 (1.521540880203247s)
5000-6000 / 8163 (1.471721887588501s)
6000-7000 / 8163 (1.5068159103393555s)
7000-8000 / 8163 (1.4724757671356201s)
8000-8163 / 8163 (0.7611980438232422s)
===
Records extracted: 8163
Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

class dimcli.core.api.DslDataset(data)[source]

Bases: IPython.core.display.JSON

Wrapper for JSON results from DSL.

This object makes it easier to process, save and load API JSON data.

Example

>>> dsl = dimcli.Dsl()
>>> data = dsl.query("""search publications for "machine learning" return publications limit 100""")
Returned Publications: 20 (total = 2501114)
Time: 1.36s
>>> print(data)
<dimcli.DslDataset object #4383191536. Records: 100/2501114>
>>> len(data)
100
>>> data.count_batch
100
>>> data.count_total
2501114
>>> data.json
#  => returns the underlying JSON data
>>> data['publications']
#  => shortcut for the 'publications' key in the underlying JSON data
>>> data.publications
#  => ..this is valid too!
as_dataframe(key='', links=False, nice=False)[source]

Return the JSON data as a Pandas DataFrame.

If key is empty, the first available JSON key (eg ‘publications’) is used to determine what JSON data should be turned into a dataframe (mostly relevant when using multi-result DSL queries).

Parameters
  • key (str, optional) – The JSON results data object that needs to be processed.

  • links (bool, optional) – Tranform suitable fields to hyperlinks. Default: False.

  • nice (bool, optional) – Reformat column names and complex values where possible. Useful for visual inspection and printing our. Default: False.

Returns

A DataFrame instance containing API records.

Return type

pandas.DataFrame

Example

See https://api-lab.dimensions.ai/cookbooks/1-getting-started/3-Working-with-dataframes.html

as_dataframe_authors(links=False)[source]

Return the JSON data as a Pandas DataFrame, in which each row corresponds to a publication author.

This method works only with ‘publications’ queries and it’s clever enough to know if the authors or author_affiliations (deprecated) fields are used. The list of affiliations per each author are not broken down and are returned as JSON. So in essence you get one row per author.

Returns

A DataFrame instance containing API records.

Return type

pandas.DataFrame

Example

See https://api-lab.dimensions.ai/cookbooks/1-getting-started/3-Working-with-dataframes.html

as_dataframe_authors_affiliations(links=False)[source]

Return the JSON data as a Pandas DataFrame, in which each row corresponds to a publication affiliation.

This method works only with ‘publications’ queries and it’s clever enough to know if the authors or author_affiliations (deprecated) fields are used. If an author has multiple affiliations, they would be represented in different rows (hence the same authors may appear on different rows).

Returns

A DataFrame instance containing API records.

Return type

pandas.DataFrame

Example

See https://api-lab.dimensions.ai/cookbooks/1-getting-started/3-Working-with-dataframes.html

as_dataframe_concepts(key='', links=False)[source]

Return the JSON data as a Pandas DataFrame, in which each row corresponds to a single ‘concept’.

This method works only with ‘publications’ and ‘grants’ queries and it’s clever enough to know if the concepts or concepts_scores fields are used. Additional metrics like ‘frequency’ and ‘score_average’ are also included in the results.

Returns

A DataFrame instance containing API records.

Return type

pandas.DataFrame

Example

See https://api-lab.dimensions.ai/cookbooks/1-getting-started/3-Working-with-dataframes.html

as_dataframe_funders(links=False)[source]

Return the JSON data as a Pandas DataFrame, in which each row corresponds to a single ‘funder’.

This method works only with ‘grants’ queries.

Returns

A DataFrame instance containing API records.

Return type

pandas.DataFrame

Example

See https://api-lab.dimensions.ai/cookbooks/1-getting-started/3-Working-with-dataframes.html

as_dataframe_investigators(links=False)[source]

Return the JSON data as a Pandas DataFrame, in which each row corresponds to a single ‘investigator’.

This method works only with ‘grants’ queries.

Returns

A DataFrame instance containing API records.

Return type

pandas.DataFrame

Example

See https://api-lab.dimensions.ai/cookbooks/1-getting-started/3-Working-with-dataframes.html

as_dimensions_url(records=500, verbose=True)[source]

Utility that turns a list of records into a Dimensions webapp URL, by using the record IDs as filters.

NOTE: this functionality is EXPERIMENTAL and may break or be removed in future versions. Also, it works only with: publications, grants, patents, clinical_trials, policy_documents.

Parameters
  • records (int, default=500) – The number of record IDs to use. With more than 500, it is likely to incur into a ‘414 Request-URI Too Large’ error.

  • verbose (bool, default=True) – Verbose mode

Returns

A string representing a Dimensions URL.

Return type

str

Example

>>> data = dsl.query("""search publications where id in ["pub.1120715293", "pub.1120975084", "pub1122068834", "pub.1120602308"] return publications""")
>>> data.as_dimensions_url()
'https://app.dimensions.ai/discover/publication?search_text=id%3A+%28pub.1120975084+OR+pub.1120715293+OR+pub.1120602308%29'
chunks(size=400, key='')[source]

Return an iterator for going through chunks of the JSON results.

Note: in DSL queries with multiple return statements it is better to specify which result-type needs to be chunked using the key parameter.

Parameters
  • size (int, default=400) – Number of objects (records) to include in each chunk.

  • key (str, optional) – The JSON results data object that needs to be chunked eg ‘publications’ or ‘grants’. If not specified, the first available dict key is used.

Returns

A iterator object

Return type

iterator

Example

Break up a 1000 records dataset into groups of 100.

>>> data = dslquery("search publications return publications limit 1000")
>>> groups = [len(x) for x in data.chunks(size=100)]
property count_batch

Number of results returned from the query.

Returns

The number of results

Return type

int

property count_total

Total number of results in Dimensions for the query (as opposed to the results returned in the JSON payload).

Returns

The number of results

Return type

int

property errors_string

Utility that merges all errors messages into a single string.

classmethod from_clinical_trials_list(data)[source]

Utility method that allows to simulate an API results DslDataset object from raw clinical_trials data. See the from_publications_list method for more information.

Parameters

data (list or pandas dataframe) – A clinical_trials list (using the API DSL structure), in the form of either a list of dictionaries, or as a pandas dataframe.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

classmethod from_grants_list(data)[source]

Utility method that allows to simulate an API results DslDataset object from raw grants data. See the from_publications_list method for more information.

Parameters

data (list or pandas dataframe) – A grants list (using the API DSL structure), in the form of either a list of dictionaries, or as a pandas dataframe.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

classmethod from_organizations_list(data)[source]

Utility method that allows to simulate an API results DslDataset object from raw organizations data. See the from_publications_list method for more information.

Parameters

data (list or pandas dataframe) – An organizations list (using the API DSL structure), in the form of either a list of dictionaries, or as a pandas dataframe.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

classmethod from_patents_list(data)[source]

Utility method that allows to simulate an API results DslDataset object from raw patents data. See the from_publications_list method for more information.

Parameters

data (list or pandas dataframe) – A patents list (using the API DSL structure), in the form of either a list of dictionaries, or as a pandas dataframe.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

classmethod from_policy_documents_list(data)[source]

Utility method that allows to simulate an API results DslDataset object from raw policy_documents data. See the from_publications_list method for more information.

Parameters

data (list or pandas dataframe) – A policy_documents list (using the API DSL structure), in the form of either a list of dictionaries, or as a pandas dataframe.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

classmethod from_publications_list(data)[source]

Utility method that allows to simulate an API results DslDataset object from raw publications data.

This functionality can be used to reload data that was cached locally, or to combine the merged results of separate API queries into a single DslDataset object.

Once created, the DslDataset object has the same exact behaviour as when it is obtained from an API query (so one can take advatange of dataframe creation methods, for example).

Parameters

data (list or pandas dataframe) – A list of publications, in the form of either a list of dictionaries, or as a pandas dataframe.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

Example

>>> dsl = dimcli.Dsl()
>>> rawdata = dsl.query("search publications return publications").publications
>>> type(rawdata)
list
>>> newDataset = dimcli.DslDataset.from_publications_list(rawdata)
>>> newDataset
<dimcli.DslDataset object #4767014816. Records: 20/20>
classmethod from_researchers_list(data)[source]

Utility method that allows to simulate an API results DslDataset object from raw researchers data. See the from_publications_list method for more information.

Parameters

data (list or pandas dataframe) – A researchers list (using the API DSL structure), in the form of either a list of dictionaries, or as a pandas dataframe.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

good_data_keys()[source]

Utility that returns the ‘data’ keys of the inner JSON object, excluding metadata like ‘stats’, ‘warnings’ and ‘version’ info.

Returns

A list of dictionary keys.

Return type

list

Example

>>> queryresults.good_data_keys()
['publications']
keys_and_count()[source]

Utility that previews the contents of the inner JSON object.

Returns

A list of tuples.

Return type

list

Example

>>> queryresults.keys_and_count()
[('_stats', 3), ('_warnings', 1), ('_version', 2), ('publications', 100)]
classmethod load_json_file(filename, verbose=False)[source]

Load a file containing DSL JSON data and returns a valid DslDataset object.

Note: this is normally used in combination with the to_json_file method.

Parameters

filename (str) – A valid filename (including path if necessary) that contains the JSON data.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

Example

Save the results of a query to a JSON file, then reload the same file and create a new dataset.

>>> dataset = dsl.query("""search publications where journal.title="nature medicine" return publications[id+title+year+concepts] limit 100""")
Returned Publications: 100 (total = 12641)

Save the data to a local json file

>>> FILENAME = "test-api-save.json"
>>> dataset.to_json_file(FILENAME, verbose=True)
Saved to file:  test-api-save.json

Create a new DslDataset object by loading the contents of the JSON file.

>>> new_dataset = DslDataset.load_json_file(FILENAME, verbose=True)
Loaded file:  test-api-save.json
>>> print(new_dataset)
<dimcli.DslDataset object #4370267824. Records: 100/12641>
to_gsheets(title=None, verbose=True)[source]

Export the dataframe version of some API results to a public google sheet. Google OAUTH client credentials are a prerequisite for this method to work correctly.

Parameters
  • title (str, optional) – The spreadsheet title, if one wants to reuse an existing spreadsheet.

  • verbose (bool, default=True) – Verbose mode

Notes

This method assumes that the calling environment can provide valid Google authentication credentials. There are two routes to make this work, depending on whether one is using Google Colab or a traditional Jupyter environment.

Google Colab This is the easiest route. In Google Colab, all required libraries are already available. The to_gsheets method simply triggers the built-in authentication process via a pop up window.

Jupyter This route involves a few more steps. In Jupyter, it is necessary to install the gspread, oauth2client and gspread_dataframe modules first. Secondly, one needs to create Google Drive access credentials using OAUTH (which boils down to a JSON file). Note that the credentials file needs to be saved in: ~/.config/gspread/credentials.json (for gpread). The steps are described at https://gspread.readthedocs.io/en/latest/oauth2.html#for-end-users-using-oauth-client-id.

Returns

The google sheet URL as a string.

Return type

str

to_json_file(filename='', verbose=True)[source]

Export API results data to a JSON file.

Note: this is normally used in combination with the load_json_file method.

Parameters

filename (str, optional) – A filename/path where to save the data. If not provided, a unique name is generated automatically.

Returns

The string representation of the filename the data is saved to.

Return type

str

Example

Save the results of a query to a JSON file, then reload the same file and create a new dataset.

>>> dataset = dsl.query("""search publications where journal.title="nature medicine" return publications[id+title+year+concepts] limit 100""")
Returned Publications: 100 (total = 12641)

Save the data to a local json file

>>> FILENAME = "test-api-save.json"
>>> dataset.to_json_file(FILENAME, verbose=True)
Saved to file:  test-api-save.json

Data can be reloaded from file, using the load_json_file class method.

>>> new_dataset = DslDataset.load_json_file(FILENAME, verbose=True)
Loaded file:  test-api-save.json
>>> print(new_dataset)
<dimcli.DslDataset object #4370267824. Records: 100/12641>

dimcli.core.functions

Python wrappers for the DSL functions. See also: https://docs.dimensions.ai/dsl/functions.html NOTE: these objects are attached to the top level dimcli.functions module. So you can load them as follows:

>>> from dimcli.functions import *
dimcli.core.functions.build_reviewers_matrix(abstracts, candidates, max_concepts=15, connector='OR', source='publications', verbose=False)[source]

Generates a matrix of candidate reviewers for abstracts, using the expert identification workflow. See also https://docs.dimensions.ai/dsl/expert-identification.html

If the input abstracts include identifiers, then those are used in the resulting matrix. Alternatively, a simple list of strings as input will result in a matrix where the identifiers are auto-generated from the abstracts order (first one is 1, etc..).

Parameters
  • abstracts (list) – The list of abstracts used for matching reviewers. Should be either a list of strings, or a list of dictionaries {'id' : '{unique-ID}', 'text' : '{the-abstract}'} including a unique identifier for each abstract.

  • candidates (list) – A list of Dimensions researchers IDs.

  • max_concepts (int, optional) – The maximum number of concepts to use for the matching. By default, this is 15. Concepts are ranked by relevance.

  • connector (str, optional) – The logical connector used in the concepts query. Should be either ‘AND’, or ‘OR’ (=default).

  • source (str, optional) – The DSL source to derive experts from. Either ‘publications’ (default) or ‘grants’.

  • verbose (bool, optional) – Verbose mode, by default False

Returns

A dataframe containing experts details, including the dimensions URL of the experts.

Return type

pandas.Dataframe

Example

>>> from dimcli.functions import build_reviewers_matrix
>>> abstracts = [
...:     {
...:     'id' : 'A1',
...:     'text' : We describe monocrystalline graphitic films, which are a few atoms thick but are nonetheless stable under ambient conditions,
...: metallic, and of remarkably high quality. The films are found to be a two-dimensional semimetal with a tiny overlap between
...: valence and conductance bands, and they exhibit a strong ambipolar electric field effect such that electrons and
...: holes in concentrations up to 10 per square centimeter and with room-temperature mobilities of approximately 10,000 square
...: centimeters per volt-second can be induced by applying gate voltage."
...:     },
...:     {
...:     'id' : "A2",
...:     'text' : ""The physicochemical properties of a molecule-metal interface, in principle, can play a significant role in tuning the electronic properties
...: of organic devices. In this report, we demonstrate an electrode engineering approach in a robust, reproducible molecular memristor that
...: enables a colossal tunability in both switching voltage (from 130 mV to 4 V i.e. >2500% variation) and current (by ~6 orders of magnitude).
...: This provides a spectrum of device design parameters that can be “dialed-in” to create fast, scalable and ultralow energy organic
...: memristors optimal for applications spanning digital memory, logic circuits and brain-inspired computing."
...:     }
...: ]
...:
>>> candidates = ["ur.01146544531.57", "ur.011535264111.51", "ur.0767105504.29",
...:               "ur.011513332561.53", "ur.01055006635.53"]
>>> build_reviewers_matrix(abstracts, candidates)
           researcher         A1        A2
0   ur.01146544531.57   8.185277  0.000000
1  ur.011535264111.51   8.203130  0.000000
2    ur.0767105504.29   8.686363  2.626348
3  ur.011513332561.53  12.920304  1.551920
4   ur.01055006635.53   6.756862  1.797738
dimcli.core.functions.extract_affiliations(affiliations, as_json=False, include_input=False)[source]

Python wrapper for the DSL function extract_affiliations.

This function returns GRID affiliations either using structured or unstructured input. Up to 200 input objects are allowed per request. See also: https://docs.dimensions.ai/dsl/functions.html#function-extract-affiliations

The input argument affiliations can be one of the following:

  • a string, representing a single unstructured ‘affiliation’, eg

    “new york university”

  • a list of strings, representing unstructured ‘affiliations’, eg

    [“new york university”, “london college of surgeons”]

  • a list of dictionaries of unstructured ‘affiliations’ data, eg

    [{“affiliation”: “london college”}, {“affiliation”: “new york university”}]

  • a list of dictionaries of structured ‘affiliations’ data, eg

    [{“name”:”london college cambridge”, “city”:””, “state”:””, “country”:””}, {“name”:”milano bicocca”, “city”:”Milano”, “state”:””, “country”:”Italy”} ]

By default, the JSON results are flattened and returned as a pandas dataframe.

NOTE internally this function always uses the ‘batch processing’ version of the API. The optional argument results is currently not supported (and hence defaults to ‘basic’).

Parameters
  • affiliations (str or list or dict) – The raw affiliation data to process.

  • as_json (bool, optional) – Return raw JSON encoded as a Python dict (instead of a pandas dataframe, by default).

  • include_input (bool, optional, False) – For unstructured affiliation matching, return also a column input_affiliation with the original input string.

Returns

A pandas dataframe containing a flattened representation of the JSON results.

Return type

pandas.DataFrame or dict

Example

>>> from dimcli.functions import extract_affiliations
>>> extract_affiliations("stanford medical center")
n  affiliation_part        grid_id          grid_name grid_city  grid_state   grid_country  requires_review geo_country_id geo_country_name geo_country_code geo_state_id geo_state_name geo_state_code geo_city_id geo_city_name
0  stanford medical center  grid.240952.8  Stanford Medicine  Stanford  California  United States             True        6252001    United States               US      5332921     California          US-CA     5398563      Stanford
>>> data = [{"affiliation": "london college"}, {"affiliation": "new york university"}]
>>> extract_affiliations(data)
n  affiliation_part        grid_id            grid_name grid_city grid_state    grid_country  requires_review geo_country_id geo_country_name geo_country_code geo_state_id geo_state_name geo_state_code geo_city_id  geo_city_name
0  london college  grid.499389.6   The London College    London       None  United Kingdom             True        2635167   United Kingdom               GB      6269131        England           None     2643743         London
1  new york university  grid.137628.9  New York University  New York   New York   United States            False        6252001    United States               US      5128638       New York          US-NY     5128581  New York City
dimcli.core.functions.extract_classification(title, abstract, system='', verbose=True)[source]

Python wrapper for the DSL function classify.

This function retrieves suggested classifications codes for any text. See also: https://docs.dimensions.ai/dsl/functions.html#function-classify

NOTE system must be the acronym of one of the supported classification systems:

  • Fields of Research (FOR)

  • Research, Condition, and Disease Categorization (RCDC)

  • Health Research Classification System Health Categories (HRCS_HC)

  • Health Research Classification System Research Activity Classifications (HRCS_RAC)

  • Health Research Areas (HRA)

  • Broad Research Areas (BRA)

  • ICRP Common Scientific Outline (ICRP_CSO)

  • ICRP Cancer Types (ICRP_CT)

  • Units of Assessment (UOA)

  • Sustainable Development Goals (SDG)

Parameters
  • title (str) – The title of the document to classify.

  • abstract (str) – The abstract of the document to classify.

  • system (str, optional) – The classification system to use. Either an acronym from the supported classification systems, or null. If no system is provided, all systems are attempted in sequence (one query per system).

  • verbose (bool, optional) – Verbose mode, by default True

Returns

A Dimcli wrapper object containing JSON data.

Return type

dimcli.DslDataset

Example

>>> from dimcli.functions import extract_classification
>>> title="Burnout and intentions to quit the practice among community pediatricians: associations with specific professional activities"
>>> extract_classification(title, "", "FOR").json
{'FOR': [{'id': '3177', 'name': '1117 Public Health and Health Services'}]}
dimcli.core.functions.extract_concepts(text, scores=True, as_df=True)[source]

Python wrapper for the DSL function extract_concepts.

Extract concepts from any text. Text input is processed and extracted concepts are returned as an array of strings ordered by their relevance. See also: https://docs.dimensions.ai/dsl/functions.html#function-extract-concepts

Parameters
  • text (str) – The text paragraphs to extract concepts from.

  • scores (bool, optional) – Return the concepts scores as well, by default True

  • as_df (bool, optional) – Return results as a pandas dataframe (instead of JSON), by default True

Returns

The list of concepts that have been extracted.

Return type

pandas.Dataframe or dimcli.DslDataset

Example

>>> from dimcli.functions import extract_concepts
>>> extract_concepts("The impact of solar rays on the moon is not trivial.")
n   concept relevance
0   impact  0.070622
1   rays    0.062369
2   solar rays      0.022934
3   Moon    0.013245
dimcli.core.functions.extract_grants(grant_number, fundref='', funder_name='')[source]

Python wrapper for the DSL function extract_grants.

Extract grant Dimensions ID from provided parameters. Grant number must be provided with either a fundref or a funder name as an argument. See also: https://docs.dimensions.ai/dsl/functions.html#function-extract-grants

Parameters
  • grant_number (str) – The grant number/ID

  • fundref (str, optional) – Fundref name

  • funder_name (str, optional) – Funder name

Returns

A Dimcli wrapper object containing JSON data.

Return type

dimcli.DslDataset

Example

>>> from dimcli.functions import extract_grants
>>> extract_grants("R01HL117329",  fundref="100000050").json
{'grant_id': 'grant.2544064'}
dimcli.core.functions.identify_experts(abstract, max_concepts=15, connector='OR', conflicts=None, extra_dsl='where year >= 2010', source='publications', verbose=False)[source]

Python wrapper for the expert identification workflow. See also https://docs.dimensions.ai/dsl/expert-identification.html

This wrapper provide a simpler version of the expert identification API. It is meant to be a convenient alternative for basic queries. For more options, it is advised to use the API directly.

Parameters
  • abstract (str) – The abstract text used to identify experts. Concepts are automatically extracted from it.

  • max_concepts (int, optional) – The maximum number of concepts to use for the identification. By default, this is 15. Concepts are ranked by relevance.

  • connector (str, optional) – The logical connector used in the concepts query. Should be either ‘AND’, or ‘OR’ (=default).

  • conflicts (list, optional) – A list of Dimensions researchers IDs used to determine overlap / conflicts of interest.

  • extra_dsl (str, optional) – A DSL clause to add after the main concepts search statement. Default is where year >= 2010.

  • source (str, optional) – The DSL source to derive experts from. Either ‘publications’ (default) or ‘grants’.

  • verbose (bool, optional) – Verbose mode, by default False

Returns

A dataframe containing experts details, including the dimensions URL of the experts.

Return type

pandas.Dataframe

Example

>>> from dimcli.functions import identify_experts
>>> identify_experts("Moon landing paved the way for supercomputers becoming mainstream", verbose=True)
Concepts extracted: 5
Query:
"
identify experts
    from concepts ""landing" OR "way" OR "mainstream" OR "moon landing" OR "supercomputers""
    using publications where year >= 2010
return experts[id+first_name+last_name+dimensions_url-obsolete]
"
Experts found: 20
[..experts list..]

dimcli.utils.dimensions

Dimcli utilities for querying and working with Dimensions data. NOTE: these functions are attached to the top level dimcli.utils module. So you can load them as follows:

>>> from dimcli.utils import *
dimcli.utils.dim_utils.dimensions_search_url(keywords_list_as_string)[source]

Generate a valid keyword search URL for Dimensions.

Parameters

keywords_list_as_string (str) – List of search keywords.

Returns

The Dimensions URL.

Return type

str

Example

>>> from dimcli.utils import dimensions_search_url
>>> dimensions_search_url("graphene AND south korea")
'https://app.dimensions.ai/discover/publication?search_text=graphene%20AND%20south%20korea&search_type=kws&search_field=full_search'
dimcli.utils.dim_utils.dimensions_styler(df, source_type='', title_links=True)[source]

Format the text display value of a dataframe by including Dimensions hyperlinks whenever possible. Useful mainly in notebooks when printing out dataframes and clicking on links etc.. Expects column names to match the default DSL field names.

Parameters
  • df (pd.Dataframe) – Pandas dataframe obtained from a DSL query e.g. via the as_dataframe methods.

  • source_type (str, optional) – The name of the source: one of ‘publications’, ‘grants’, ‘patents’, ‘policy_documents’, ‘clinical_trials’, ‘researchers’. If not provided, it can be inferred in some cases.

  • title_links (bool, optional, True) – Hyperlink document titles too, using the ID (if available).

Notes

Implemented using https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.format.html. Side effect is that the resulting dataframe becomes an instance of pandas.io.formats.style.styler, which is a wrapper around the underlying Styler object. TIP To get back to the original dataframe, you can use the .data method. See also: https://stackoverflow.com/questions/42263946/how-to-create-a-table-with-clickable-hyperlink-in-pandas-jupyter-notebook

Returns

Wrapper for a dataframe object, including custom Dimensions hyperlinks.

Return type

pandas.io.formats.style.Styler

Example

>>> from dimcli.utils import dimensions_styler
>>> dsl = dimcli.Dsl()
>>> q = 'search publications for "scientometrics" return publications[basics]'
>>> df = dsl.query(q).as_dataframe()
>>> dimensions_styler(df)
#
# alternatively, using the shortcut method:
#
>>> dsl.query(q).as_dataframe(links=True)
dimcli.utils.dim_utils.dimensions_url(obj_id, obj_type='', verbose=True)[source]

Generate a valid Dimensions URL for one of the available sources.

Parameters
  • obj_id (str) – A Dimensions ID for one of the available sources.

  • obj_type (str, optional) – The name of the source: one of ‘publications’, ‘grants’, ‘patents’, ‘policy_documents’, ‘clinical_trials’, ‘researchers’. If not provided, it’s inferred using the ID structure.

Returns

The object URL.

Return type

str

Example

>>> from dimcli.utils import dimensions_url
>>> dimensions_url("pub.1127419018")
'https://app.dimensions.ai/details/publication/pub.1127419018'
dimcli.utils.dim_utils.dsl_escape(stringa, all=False)[source]

Helper for escaping the full-text inner query strings, when they includes quotes.

EG with the query string: ‘“2019-nCoV” OR “COVID-19” OR “SARS-CoV-2” OR ((“coronavirus” OR “corona virus”) AND (Wuhan OR China))’

In Python, if you want to embed it into a DSL query, it has to become: ‘"2019-nCoV" OR "COVID-19" OR "SARS-CoV-2" OR (("coronavirus" OR "corona virus") AND (Wuhan OR China))’

See also: https://docs.dimensions.ai/dsl/language.html#for-search-term

Parameters
  • stringa (str) – Full-text search component of a DSL query.

  • all (bool, default=False) – By default only quotes as escaped. Set to True to escape all special characters (eg colons)

Example

>>> dsl_escape('Solar cells: a new technology?', True)
'Solar cells\: a new technology?'
dimcli.utils.dim_utils.dslquery(query_string)[source]

Shortcut for running a query without instantiating dimcli.Dsl().

Added for backward compatibility with legacy API tutorials. Requires file-based credentials for logging in.

Parameters

query_string (str) – A valid DSL query.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

dimcli.utils.dim_utils.dslquery_json(query_string)[source]

Shortcut for running a query without instantiating dimcli.Dsl(). Same as dslquery but returns raw JSON instead of Api.DslDataset object

Added for backward compatibility with legacy API tutorials. Requires file-based credentials for logging in.

Parameters

query_string (str) – A valid DSL query.

Returns

API JSON data, represented as a dict object.

Return type

Dict

dimcli.utils.dim_utils.dslqueryall(query_string)[source]

Shortcut for running a loop query without instantiating dimcli.Dsl().

Added for backward compatibility with legacy API tutorials. Requires file-based credentials for logging in.

Parameters

query_string (str) – A valid DSL query.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

dimcli.utils.dim_utils.gen_dslqueries(sources, text='Albert Einstein')[source]

Generate test DSL queries for each source eg >>> from dimcli import G >>> gen_dslqueries(G.sources())

dimcli.utils.miscellaneous

Dimcli general purpose utilities for working with data. NOTE: these functions are attached to the top level dimcli.utils module. So you can import them as follows:

>>> from dimcli.utils import *
dimcli.utils.misc_utils.chunks_of(data, size)[source]

Splits up a list or sequence in to chunks of selected size.

Parameters
  • data (sequence) – A sequence eg a list that needs to be chunked.

  • size (int) – The number of items in each group.

Returns

An iterable

Return type

Iterator

Example

>>> from dimcli.utils import chunks_of
>>> a = range(10)
>>> for x in chunks_of(a, 5):
        print(len(x))
5
5
>>> list(chunks_of(a, 5))
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
dimcli.utils.misc_utils.exists_key_in_dicts_list(dict_list, key)[source]

From a list of dicts, checks if a certain key is in one of the dicts in the list.

See also https://stackoverflow.com/questions/14790980/how-can-i-check-if-key-exists-in-list-of-dicts-in-python

Parameters
  • dict_list (list) – A list of dictionaries.

  • key (obj) – The obj to be found in dict keys

Returns

Return type

Dict or None

dimcli.utils.misc_utils.explode_nested_repeated_field(dataframe, field_name)[source]

Utility that can be run against any nested repeated field returned by the API, in order to flatten them so that they are more easily used in spreadsheets and other tools.

Parameters
  • dataframe (pd.Dataframe) – A dataframe object.

  • field_name (string) – The column of the dataframe to be exploded.

Returns

A new dataframe with new columns corresponding to the flattened column. The new columns prefix is the original column label.

Return type

pd.Dataframe

dimcli.utils.misc_utils.export_as_gsheets(input_data, query='', title=None, verbose=True)[source]

Save data to google sheets with one-line.

Works with raw JSON (from API), or even a Dataframe.

Parameters
  • input_data (JSON or DataFrame) – The data to be uploaded

  • query (str) – The DSL query - this is neeeded only when raw API JSON is passed

  • title (str, optional) – The spreadsheet title, if one wants to reuse an existing spreadsheet.

  • verbose (bool, default=True) – Verbose mode

Notes

This method assumes that the calling environment can provide valid Google authentication credentials. There are two routes to make this work, depending on whether one is using Google Colab or a traditional Jupyter environment.

Google Colab This is the easiest route. In Google Colab, all required libraries are already available. The to_gsheets method simply triggers the built-in authentication process via a pop up window.

Jupyter This route involves a few more steps. In Jupyter, it is necessary to install the gspread, oauth2client and gspread_dataframe modules first. Secondly, one needs to create Google Drive access credentials using OAUTH (which boils down to a JSON file). Note that the credentials file needs to be saved in: ~/.config/gspread/credentials.json (for gpread to work correctly). These steps are described at https://gspread.readthedocs.io/en/latest/oauth2.html#for-end-users-using-oauth-client-id.

Returns

The google sheet URL as a string.

Return type

str

Example

>>> import pandas as pd
>>> from dimcli.utils export_as_gsheets
>>> cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
             'Price': [22000,25000,27000,35000]
             }
>>> df = pd.DataFrame(cars, columns = ['Brand', 'Price'])
>>> export_as_gsheets(df)
..authorizing with google..
..creating a google sheet..
..uploading..
Saved:
https://docs.google.com/spreadsheets/d/1tsyRFDEsADltWDdqjuyDWDOg81sl9hN3Nu8MXVlqDDI
dimcli.utils.misc_utils.google_url(stringa)[source]

Generate a valid google search URL from a string (URL quoting is applied).

Example

>>> from dimcli.utils import google_url
>>> google_url("malaria AND africa")
'https://www.google.com/search?q=malaria%20AND%20africa'
dimcli.utils.misc_utils.normalize_key(key_name, dict_list, new_val=None)[source]

Ensures a key always appear in a JSON dict/objects list by adding it when missing. Used to prepare API results for subsequent data processing operations, where a missing key in the records may lead to unwanted errors.

UPDATE 2019-11-28 v0.6.1.2: normalizes also ‘None’ values (to address 1.21 DSL change)

Parameters
  • key_name (obj) – The dict key to normalize.

  • dict_list (list) – List of dictionaries where to be processed.

  • new_val (obj, optional) – Default value to add to the key, when not found. If new_val is not passed, it is inferred from first available non-empty value.

Returns

Same dictionary being passed. Changes happen in-place.

Return type

dict

Example

>>> for x in pubs_details.publications:
        if not 'FOR' in x:
            x['FOR'] = []

becomes simply:

>>> normalize_key("FOR", pubs_details.publications)
dimcli.utils.misc_utils.open_multi_platform(fpath)[source]

Open a file using the native OS tools, taking care of platform differences.

Supports win, macos and linux.

dimcli.utils.misc_utils.printDebug(text, mystyle='', err=True, **kwargs)[source]

Wrapper around click.secho() for printing in colors with various defaults.

Parameters
  • text (string) – The text to print

  • mystyle (string) – One of: comment, important, normal, red, error, green

  • err (boolean, default: True) – By default print to standard error stderr (err=True). This means that the output is ok with less and when piped to other commands (or files).

  • kwargs (dict) – Pass any other named parameter accepted by click.secho(), eg you can do printDebug(“s”, bold=True)

Notes

Styles a text with ANSI styles and returns the new string. See https://click.palletsprojects.com/en/5.x/api/#click.echo and http://click.pocoo.org/5/api/#click.style. By default the styling is self contained which means that at the end of the string a reset code is issued. This can be prevented by passing reset=False.

Supported click color names: black (might be a gray) red green yellow (might be an orange) blue magenta cyan white (might be light gray) reset (reset the color code only)

Supported click parameters: text – the string to style with ansi codes. fg – if provided this will become the foreground color. bg – if provided this will become the background color. bold – if provided this will enable or disable bold mode. dim – if provided this will enable or disable dim mode. This is badly supported. underline – if provided this will enable or disable underline. blink – if provided this will enable or disable blinking. reverse – if provided this will enable or disable inverse rendering (foreground becomes background and the other way round). reset – by default a reset-all code is added at the end of the string which means that styles do not carry over. This can be disabled to compose styles.

Example

>>> printDebug("My comment", "comment")
>>> printDebug("My warning", "important")
# This works also with inner click styles eg
>>> uri, title = "http://example.com", "My ontology"
>>> printDebug(click.style("[%d]" % 1, fg='blue') +
           click.style(uri + " ==> ", fg='black') +
           click.style(title, fg='red'))
# or even with Colorama
>>> from colorama import Fore, Style
>>> printDebug(Fore.BLUE + Style.BRIGHT + "[%d]" % 1 +
        Style.RESET_ALL + uri + " ==> " + Fore.RED + title +
        Style.RESET_ALL)
# Memo: how the underlying click.echo works:
>>> click.echo(click.style('Hello World!', fg='green'))
>>> click.echo(click.style('ATTENTION!', blink=True))
>>> click.echo(click.style('Some things', reverse=True, fg='cyan'))
Returns

The colorized text.

Return type

str

dimcli.utils.misc_utils.printInfo(text, mystyle='', **kwargs)[source]

Wrapper around printDebug for printing ALWAYS to stdout This means that the output can be grepped etc.. NOTE this output will be picked up by pipes etc..

Fixes https://github.com/lambdamusic/Ontospy/issues/76

dimcli.utils.misc_utils.save2File(contents, filename, path)[source]

Save string contents to a file, creating the file if it doesn’t exist.

NOTE Not generalized much, so use at your own risk.

Parameters
  • contents (str) – File contents

  • filename (str) – Name of the file.

  • path (str) – Full path of the file to save. If not existing, it gets created.

Returns

The file path with format “file://…”

Return type

str

dimcli.utils.misc_utils.walk_up(bottom)[source]

Mimic os.walk, but walk ‘up’ instead of down the directory tree

Example

#print all files and directories # directly above the current one >>> for i in walk_up(os.curdir): >>> print(i)

# look for a TAGS file above the # current directory >>> for c,d,f in walk_up(os.curdir): >>> if ‘TAGS’ in f: >>> print(c) >>> break

dimcli.utils.converters

class dimcli.utils.converters.DslClinicaltrialsConverter(df, verbose=False)[source]

Bases: dimcli.utils.converters.DslDataConverter

class dimcli.utils.converters.DslDataConverter(df, object_type='', verbose=False)[source]

Bases: object

Helper class containing methods for transforming JSON complex snippets to other formats. Useful eg for creating a nice looking CSV from raw API data.

Status: ALPHA - UNSUPPORTED FEATURE

Converters subclasses available only for * Pubs * Grants * Clinical Trials * Datasets * Patents

To Review: * Organizations * Policy Documents * Researchers

Example

>>> from dimcli.utils.converters import *
>>> df_temp = dsl.query_iterative("search publications return publications").as_dataframe()
>>> c1 = DslDatasetsConverter(df_temp)
>>> df_final = c1.run()

# iterate through all keys/columns in dataframe # # if column name == key in fields_mappings: # apply all functions => generate new columns # remove old column # else if column value is list # break down list into semicolon delimited string # replace old column # # PS dimensions_url special case, we just add a new column without removing ‘ID’ # also, it applies only to sources

apply_transformations()[source]

For each column, see if there is a transformation defined, and apply it.

keep_extra_cols:

bool, True Columns not included in the transformation rules are included by default.

convert_abstract_to_preview(abstract)[source]
convert_authors_affiliations(authorslist)[source]
convert_authors_countries(authorslist)[source]
convert_authors_grids(authorslist)[source]
convert_authors_to_names(authorslist)[source]
convert_id_to_url(idd, ttype=None)[source]
convert_interventions_dict(data)[source]

Return ‘name’ and ‘type’ for clinical trials / interventions

From: “[{‘arm_group_labels’: ‘Hydroxychloroquine and conventional treatments’, ‘type’: ‘Drug’, ‘description’: ‘Subjects take hydroxychloroquine 400 mg per day for 5 days, also take conventional treatments’, ‘other_names’: ‘’, ‘name’: ‘Hydroxychloroquine’}]”

To: “Hydroxychloroquine (Drug)”

convert_investigators_cltrials(investigatorslist)[source]

From: [[‘Chaoqian Li’, ‘’, ‘Study leader’, ‘6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region, China’, ‘’, ‘’], [‘Jianlin Huang’, ‘’, ‘Applicant’, “Beihai People’s Hospital”, “Beihai People’s Hospital”, ‘grid.452719.c’]] To “Chaoqian Li; Jianlin Huang”

extend_transformations()[source]

Add default transformations for all fields found in a df (not just the ones defined explicitly) using standard rules (camel case and spacing).

run(keep_extra_cols=True)[source]

@TODO define a suitable abstraction for automatic transformation eg simplify all fields to strings

sort_and_prune(new_cols_ordered_list=None)[source]

generate a default order if not provided, keeping only those cols

truncate_for_gsheets(cols_subset=None)[source]

helper to avoid gsheets error ‘Your input contains more than the maximum of 50000 characters in a single cell.’

cols_subset: eg [‘Abstract’, ‘Authors’, ‘Authors Affiliations’]

class dimcli.utils.converters.DslDatasetsConverter(df, verbose=False)[source]

Bases: dimcli.utils.converters.DslDataConverter

class dimcli.utils.converters.DslGrantsConverter(df, verbose=False)[source]

Bases: dimcli.utils.converters.DslDataConverter

class dimcli.utils.converters.DslOrganizationsConverter(df, verbose=False)[source]

Bases: dimcli.utils.converters.DslDataConverter

@TODO review

class dimcli.utils.converters.DslPatentsConverter(df, verbose=False)[source]

Bases: dimcli.utils.converters.DslDataConverter

class dimcli.utils.converters.DslPolicyDocumentsConverter(df, verbose=False)[source]

Bases: dimcli.utils.converters.DslDataConverter

@TODO review

class dimcli.utils.converters.DslPubsConverter(df, verbose=False)[source]

Bases: dimcli.utils.converters.DslDataConverter

class dimcli.utils.converters.DslReportsConverter(df, verbose=False)[source]

Bases: dimcli.utils.converters.DslDataConverter

@TODO review

class dimcli.utils.converters.DslResearchersConverter(df, verbose=False)[source]

Bases: dimcli.utils.converters.DslDataConverter

@TODO review

class dimcli.utils.converters.DslSourceTitlesConverter(df, verbose=False)[source]

Bases: dimcli.utils.converters.DslDataConverter

@TODO review

dimcli.jupyter.magics

Dimcli magic commands used with iPython / Jupyter environments only. See also: https://api-lab.dimensions.ai/cookbooks/1-getting-started/4-Dimcli-magic-commands.html

NOTE All magic commands results get saved automatically to a variable named dsl_last_results.

class dimcli.jupyter.magics.DslMagics(**kwargs)[source]

Bases: IPython.core.magic.Magics

dsl(line, cell=None)[source]

Magic command to run a single DSL query.

Can be used as a single-line (%dsl) or multi-line (%%dsl) command. Requires an authenticated API session. If used as a multi-line command, a variable name can be specified as the first argument. Otherwise, the results are saved to a variable called dsl_last_results.

Parameters

line (str) – A valid DSL search query.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

Example

>>> %dsl search publications for "malaria" return publications limit 500
>>> %%dsl my_data
...    search publications for "malaria" return publications limit 500
dsl_extract_concepts(line, cell=None)[source]

Magic command to run the extract_concepts function. Results are transformed to a Pandas DataFrame. Score are included by default.

If used as a multi-line command, a variable name can be specified as the first argument. Otherwise, the results are saved to a variable called dsl_last_results.

Parameters

cell (str) – Text to extract concepts from.

Returns

A pandas dataframe containing the concepts and scores.

Return type

pandas.DataFrame

Example

>>> %%extract_concepts
... <text>
dsl_identify_experts(line, cell=None)[source]

Magic command to run the identify_experts function. Uses all the default options, takes only the abstract argument.

If used as a multi-line command, a variable name can be specified as the first argument. Otherwise, the results are saved to a variable called dsl_last_results.

Parameters

cell (str) – Text abstract to use to find experts.

Returns

A pandas dataframe containing experts details.

Return type

pandas.DataFrame

Example

>>> %%identify_experts
... <text>
dsldf(line, cell=None)[source]

Magic command to run a single DSL query, results are transformed to a Pandas DataFrame.

Can be used as a single-line (%dsldf) or multi-line (%%dsldf) command. Requires an authenticated API session. If used as a multi-line command, a variable name can be specified as the first argument. Otherwise, the results are saved to a variable called dsl_last_results.

Flags:

--links => style the dataframe with links to the original data sources.

--nice => break down complex structures into strings (EXPERIMENTAL).

Parameters
  • line (str) – A valid DSL search query, or, for multiline commands, a parameter name and/or formatting flags.

  • cell (str) – A valid DSL search query.

Returns

A pandas dataframe containing the query results.

Return type

pandas.DataFrame

Example

>>> %dsldf search publications for "malaria" return publications limit 500
>>> %%dsldf --links
...    search publications for "malaria" return publications limit 500
>>> %%dsldf my_data
...    search publications for "malaria" return publications limit 500
dsldocs(line, cell=None)[source]

Magic command to get DSL documentation about sources and fields.

This is a wrapper around the DSL describe function.

Parameters

line (str, optional) – The DSL source or entity name to get documentation for. If omitted, all the documentation is downloaded.

Returns

A pandas dataframe containing the query results.

Return type

pandas.DataFrame

Example

>>> %dsldocs publications
dslgsheets(line, cell=None)[source]

Magic command to run a single DSL query and to save the results to a google sheet.

NOTE: this method requires preexisting valid Google authentication credentials. See the description of utils.export_as_gsheets for more information.

Can be used as a single-line (%dsl) or multi-line (%%dsl) command. Requires an authenticated API session. If used as a multi-line command, a variable name can be specified as the first argument. Otherwise, the results are saved to a variable called dsl_last_results.

Parameters

line (str) – A valid DSL search query.

Returns

A string representing the google sheet URL.

Return type

str

Example

>>> %dslgsheets search publications for "malaria" return publications limit 500
dslloop(line, cell=None)[source]

Magic command to run a DSL ‘loop’ (iterative) query.

This command automatically loops over all the pages of a results set, until all possible records have been returned.

Can be used as a single-line (%dsl) or multi-line (%%dsl) command. Requires an authenticated API session. If used as a multi-line command, a variable name can be specified as the first argument. Otherwise, the results are saved to a variable called dsl_last_results.

Parameters

line (str) – A valid DSL search query. Should not include limit/skip clauses, as those are added automatically during the iterations.

Returns

A Dimcli wrapper object containing JSON data.

Return type

DslDataset

Example

>>> %dslloop search publications for "malaria" where times_cited > 200 return publications
>>> %%dslloop my_data
...    search publications for "malaria" return publications limit 500
dslloopdf(line, cell=None)[source]

Magic command to run a DSL ‘loop’ (iterative) query. Results are automatically transformed to a pandas dataframe.

Can be used as a single-line (%dsl) or multi-line (%%dsl) command. Requires an authenticated API session. If used as a multi-line command, a variable name can be specified as the first argument. Otherwise, the results are saved to a variable called dsl_last_results.

Pass the --links flag to style the dataframe with links to the original data sources.

Parameters

line (str) – A valid DSL search query. Should not include limit/skip clauses, as those are added automatically during the iterations.

Returns

A pandas dataframe containing the query results.

Return type

pandas.DataFrame

Example

>>> %dslloopdf search publications for "malaria" where times_cited > 200 return publications
>>> %%dslloopdf --links
...    search publications for "malaria" return publications limit 500
>>> %%dslloopdf my_data
...    search publications for "malaria" return publications limit 500
dslloopgsheets(line, cell=None)[source]

Magic command to run a DSL ‘loop’ (iterative) query. Results are automatically uploaded to google sheets.

NOTE: this method requires preexisting valid Google authentication credentials. See also https://gspread.readthedocs.io/en/latest/oauth2.html and the description of utils.export_as_gsheets for more information.

Can be used as a single-line (%dsl) or multi-line (%%dsl) command. Requires an authenticated API session. If used as a multi-line command, a variable name can be specified as the first argument. Otherwise, the results are saved to a variable called dsl_last_results.

Parameters

line (str) – A valid DSL search query. Should not include limit/skip clauses, as those are added automatically during the iterations.

Returns

A pandas dataframe containing the query results.

Return type

pandas.DataFrame

Example

>>> %dslloopdf search publications for "malaria" where times_cited > 200 return publications