../../_images/badge-colab.svg ../../_images/badge-github-custom.svg

Exploring The Dimensions Search Language (DSL) - Quick Intro

This Notebook takes you through the basics of using the Dimensions API.

In this tutorial we leverage the capabilities of the Dimcli library in the context of Jupyter Notebooks. Dimcli is an open source Python library that simplifies common operations like logging in, querying and displaying results.

[1]:
import datetime
print("==\nCHANGELOG\nThis notebook was last run on %s\n==" % datetime.date.today().strftime('%b %d, %Y'))
==
CHANGELOG
This notebook was last run on Jan 24, 2022
==

Prerequisites

This notebook assumes you have installed the Dimcli library and are familiar with the ‘Getting Started’ tutorial.

[1]:
!pip install dimcli -U --quiet

import dimcli
from dimcli.utils import *
import sys

print("==\nLogging in..")
# https://digital-science.github.io/dimcli/getting-started.html#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  KEY = getpass.getpass(prompt='API Key: ')
  dimcli.login(key=KEY, endpoint=ENDPOINT)
else:
  KEY = ""
  dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()
Searching config file credentials for 'https://app.dimensions.ai' endpoint..
==
Logging in..
Dimcli - Dimensions API Client (v0.9.6)
Connected to: <https://app.dimensions.ai/api/dsl> - DSL v2.0
Method: dsl.ini file

What the query statistics refer to

When performing a DSL search, a _stats object is return which contains some useful info eg the total number of records available for a search.

[2]:
res1 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications""", verbose=False)
print(res1.stats) # PS this is short for `res.json['_stats'])`
{'total_count': 5807}

It is important to note though that the total number always refers to the main source one is searching for, not necessarily the results being returned. For example, in this query we return researchers linked to publications:

[3]:
res2 = dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers""", verbose=False)
print(res2.stats)
{'total_count': 5807}

Still 3815 records! That’s because the total count always refers to the main object type one is searching for, not to the facet being returned.

Tip: this basic information about objects returned is also available via the count_batch and count_total methods of the query results object.

[4]:
result = dsl.query("""
     search publications
       for "malaria AND congo"
     return publications[basics]
     limit 30
""", verbose=False)
# print some stats using the Result object
print("Results in this batch: ", result.count_batch)
print("Results in total: ", result.count_total)
print("Errors: ",result.errors)
Results in this batch:  30
Results in total:  86890
Errors:  None

Working with fields

Note: in the following examples we use the magic command %%dsldf for quicker querying.

Control the fields you return

[5]:
%%dsldf

search publications
return publications[id+title+year+doi]
limit 5
Returned Publications: 5 (total = 124736479)
Time: 2.29s
[5]:
doi id title year
0 10.13170/depik.10.3.22492 pub.1144593888 Profile of ectoparasites and biometric conditi... 2022
1 10.1007/s11708-021-0812-6 pub.1144587500 Experimental study of stratified lean burn cha... 2022
2 10.1145/3480027 pub.1141731113 Opportunities and Challenges in Code Search Tools 2022
3 10.1145/3479393 pub.1141731112 Ransomware Mitigation in the Modern Era: A Com... 2022
4 10.1145/3478680 pub.1141731111 Service Computing for Industry 4.0: State of t... 2022

Make a mistake, and the DSL will tell you what fields that you could have used

[6]:
%%dsldf

search publications
return publications[dois]
limit 100
Returned Errors: 1
Time: 4.06s
1 QueryError found
Semantic errors found:
        Field / Fieldset 'dois' is not present in Source 'publications'. Available fields: abstract,acknowledgements,altmetric,altmetric_id,arxiv_id,authors,authors_count,book_doi,book_series_title,book_title,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_uoa,clinical_trial_ids,concepts,concepts_scores,date,date_inserted,date_online,date_print,dimensions_url,doi,field_citation_ratio,funder_countries,funders,id,issn,issue,journal,journal_lists,journal_title_raw,linkout,mesh_terms,open_access,pages,pmcid,pmid,proceedings_title,publisher,recent_citations,reference_ids,referenced_pubs,relative_citation_ratio,research_org_cities,research_org_countries,research_org_country_names,research_org_names,research_org_state_codes,research_org_state_names,research_orgs,researchers,resulting_publication_doi,source_title,subtitles,supporting_grant_ids,times_cited,title,type,volume,year and available fieldsets: basics,book,categories,extras

..or search for a researcher by a specific id

[11]:
%%dsldf

search publications
where researchers.id = "ur.013514345521.07"
return publications[doi+researchers]
limit 1
Returned Publications: 1 (total = 22)
Time: 2.68s
[11]:
doi researchers
0 10.1201/9781003042570-10 [{'first_name': 'Rashi', 'id': 'ur.01001350755...

Sources VS Facets

One of the queries above is using the researchers facet of the publications source.

In general source-queries can return up to 1000 records. For example this throws an exception:

[12]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 2000
  """)
Returned Errors: 1
Time: 0.57s
1 QueryError found
Semantic errors found:
        Limit 2000 exceeds maximum allowed limit 1000
[12]:
<dimcli.DslDataset object #4812964912. Errors: 1>

You can paginate through source results up to 50000 rows

With sources, you can use the limit/skip syntax in order to paginate through results:

[13]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return publications limit 1000 skip 1000
  """)
Returned Publications: 1000 (total = 5807)
Time: 2.40s
[13]:
<dimcli.DslDataset object #4407315520. Records: 1000/5807>

You can return max 1000 facet rows

It is important to remember that when using facets you cannot use the skip operation so the maximum number of records is always 1000.

[14]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1 skip 1000
  """)
Returned Errors: 1
Time: 0.95s
1 QueryError found
Semantic errors found:
        Offset is not supported for facet results
[14]:
<dimcli.DslDataset object #4811599632. Errors: 1>

While this works…

[15]:
dsl.query("""
  search publications
  where year in [2013:2018] and research_orgs="grid.258806.1"
  return researchers limit 1000
  """)
Returned Researchers: 1000
Time: 2.94s
[15]:
<dimcli.DslDataset object #4811691728. Records: 1000/5807>

Just make a mistake, and you will ge the complete list of available facets

[16]:
dsl.query("""
search publications
return years
""")
Returned Errors: 1
Time: 0.74s
1 QueryError found
Semantic errors found:
        Facet 'years' is not present in source 'publications'. Available facets are: authors_count,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_sdg,category_uoa,funder_countries,funders,journal,journal_lists,mesh_terms,open_access,publisher,referenced_pubs,research_org_cities,research_org_countries,research_org_state_codes,research_orgs,researchers,source_title,times_cited,type,year
[16]:
<dimcli.DslDataset object #4811597088. Errors: 1>


Note

The Dimensions Analytics API allows to carry out sophisticated research data analytics tasks like the ones described on this website. Check out also the associated Github repository for examples, the source code of these tutorials and much more.

../../_images/badge-dimensions-api.svg