Get-File-From-Job¶
This example demonstrates how to use Mat3ra RESTful API to check for and acquire files from jobs which have been run. This example assumes that the user is already familiar with the creation and submission of jobs using our API.
IMPORTANT NOTE: In order to run this example in full, an active Mat3ra.com account is required. Alternatively, Readers may substitute the workflow ID below with another one (an equivalent one for VASP, for example) and adjust extraction of the results ("Viewing job files" section). RESTful API credentials shall be updated in settings.
Steps¶
After working through this notebook, you will be able to:
- Import the structure of Si from Materials Project
- Set up and run a single-point calculation using Quantum Espresso.
- List files currently in the job's directory
- Check metadata for every file (modification date, size, etc)
- Access file contents directly and print them to console
- Download files to your local machine
Pre-requisites¶
The explanation below assumes that the reader is familiar with the concepts used in Mat3ra platform and RESTful API. We outline these below and direct the reader to the original sources of information:
Complete Authorization Form and Initialize Settings¶
This will also determine environment and set all environment variables. We determine if we are using Jupyter Notebooks or Google Colab to run this tutorial.
If you are running this notebook from Google Colab, Colab takes ~1 min to execute the following cell.
ACCOUNT_ID and AUTH_TOKEN - Authentication parameters needed for when making requests to Mat3ra.com's API Endpoints.
MATERIALS_PROJECT_API_KEY - Authentication parameter needed for when making requests to Material Project's API
ORGANIZATION_ID - Authentication parameter needed for when working with collaborative accounts https://docs.mat3ra.com/collaboration/organizations/overview/
NOTE: If you are running this notebook from Jupyter, the variables ACCOUNT_ID, AUTH_TOKEN, MATERIALS_PROJECT_API_KEY, and ORGANIZATION_ID should be set in the file settings.json if you need to use these variables. To obtain API token parameters, please see the following link to the documentation explaining how to get them: https://docs.mat3ra.com/accounts/ui/preferences/api/
# @title Authorization Form
ACCOUNT_ID = "ACCOUNT_ID" # @param {type:"string"}
AUTH_TOKEN = "AUTH_TOKEN" # @param {type:"string"}
MATERIALS_PROJECT_API_KEY = "MATERIALS_PROJECT_API_KEY" # @param {type:"string"}
ORGANIZATION_ID = "ORGANIZATION_ID" # @param {type:"string"}
import os
if "COLAB_JUPYTER_IP" in os.environ:
os.environ.update(
dict(
ACCOUNT_ID=ACCOUNT_ID,
AUTH_TOKEN=AUTH_TOKEN,
MATERIALS_PROJECT_API_KEY=MATERIALS_PROJECT_API_KEY,
ORGANIZATION_ID=ORGANIZATION_ID,
)
)
!GIT_BRANCH="dev"; export GIT_BRANCH; curl -s "https://raw.githubusercontent.com/Exabyte-io/api-examples/${GIT_BRANCH}/scripts/env.sh" | bash
Imports¶
# Import settings file and utils file
from utils.settings import ENDPOINT_ARGS, ACCOUNT_ID, MATERIALS_PROJECT_API_KEY
from utils.generic import (
wait_for_jobs_to_finish,
get_property_by_subworkflow_and_unit_indicies,
dataframe_to_html,
display_JSON,
)
# Relevant functions from the API client
from exabyte_api_client.endpoints.jobs import JobEndpoints
from exabyte_api_client.endpoints.projects import ProjectEndpoints
from exabyte_api_client.endpoints.materials import MaterialEndpoints
from exabyte_api_client.endpoints.bank_workflows import BankWorkflowEndpoints
from exabyte_api_client.endpoints.raw_properties import RawPropertiesEndpoints
Create and submit the job¶
For this job, we'll use the workflow located here.
This workflow is a single-point total energy calculation using Density-Functional Energy as-implemented in Quantum Espresso version 5.4.0.
The PBE functional is used in conjunction with an ultrasoft pseudopotential and a planewave basis set.
The material we will investigate is elemental Silicon, as-is from Materials Project.
Note: This cell uses our API to copy the unit cell of silicon from Materials Project into your account. It then copies a workflow to get the total energy of a system using Quantum Espresso to your account. Finally, a job is created using the Quantum Espresso workflow for the silicon unit cell, and the job is submitted to the cluster. For more information, please refer to our run-simulation-and-extract-properties notebook, located in this directory.
# Get some account information
project_endpoints = ProjectEndpoints(*ENDPOINT_ARGS)
project_metadata = project_endpoints.list({"isDefault": True, "owner._id": ACCOUNT_ID})[0]
project_id = project_metadata["_id"]
owner_id = project_metadata["owner"]["_id"]
# Get a workflow for the job from the bank, and copy it to our account
bank_workflow_endpoints = BankWorkflowEndpoints(*ENDPOINT_ARGS)
BANK_WORKFLOW_ID = "84DAjE9YyTFndx6z3"
workflow_id = bank_workflow_endpoints.copy(BANK_WORKFLOW_ID, owner_id)["_id"]
# Get materials for the job
material_endpoints = MaterialEndpoints(*ENDPOINT_ARGS)
material_project_id = ["mp-149"] # The importer expects a list
materials = material_endpoints.import_from_materialsproject(MATERIALS_PROJECT_API_KEY, material_project_id, owner_id)
# Create the job
job_endpoints = JobEndpoints(*ENDPOINT_ARGS)
job = job_endpoints.create_by_ids(
materials=materials, workflow_id=workflow_id, project_id=project_id, owner_id=owner_id, prefix="Test_Job_Output"
)[0]
# Submit the job
job_endpoints.submit(job["_id"])
wait_for_jobs_to_finish(job_endpoints, [job["_id"]])
Wait for jobs to finish, poll interval: 10 sec +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:18:16 | 1 | 0 | 0 | 0 | +---------------------+------------------+---------------+-----------------+----------------+ +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:18:26 | 1 | 0 | 0 | 0 | +---------------------+------------------+---------------+-----------------+----------------+ +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:18:37 | 1 | 0 | 0 | 0 | +---------------------+------------------+---------------+-----------------+----------------+ +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:18:47 | 1 | 0 | 0 | 0 | +---------------------+------------------+---------------+-----------------+----------------+ +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:18:57 | 1 | 0 | 0 | 0 | +---------------------+------------------+---------------+-----------------+----------------+ +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:19:08 | 1 | 0 | 0 | 0 | +---------------------+------------------+---------------+-----------------+----------------+ +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:19:18 | 1 | 0 | 0 | 0 | +---------------------+------------------+---------------+-----------------+----------------+ +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:19:29 | 0 | 1 | 0 | 0 | +---------------------+------------------+---------------+-----------------+----------------+ +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:19:39 | 0 | 1 | 0 | 0 | +---------------------+------------------+---------------+-----------------+----------------+ +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:19:50 | 0 | 1 | 0 | 0 | +---------------------+------------------+---------------+-----------------+----------------+ +---------------------+------------------+---------------+-----------------+----------------+ | TIME | SUBMITTED-JOBS | ACTIVE-JOBS | FINISHED-JOBS | ERRORED-JOBS | +=====================+==================+===============+=================+================+ | 2023-07-28-15:20:00 | 0 | 0 | 1 | 0 | +---------------------+------------------+---------------+-----------------+----------------+
Monitor the jobs and print the status until they are all finished.
files = job_endpoints.list_files(job["_id"])
paths = [file["key"] for file in files]
for path in paths:
if "outdir" not in path:
print(path)
/cluster-001-home/demo/data/demo/test-job-output-si-207-F6DrciP3aTGyqd4Yk/.exabyte/92538.master-production-20160630-cluster-001.exabyte.io /cluster-001-home/demo/data/demo/test-job-output-si-207-F6DrciP3aTGyqd4Yk/.exabyte/checkpoint /cluster-001-home/demo/data/demo/test-job-output-si-207-F6DrciP3aTGyqd4Yk/.exabyte/job.rms /cluster-001-home/demo/data/demo/test-job-output-si-207-F6DrciP3aTGyqd4Yk/.exabyte/machines-92538.master-production-20160630-cluster-001.exabyte.io /cluster-001-home/demo/data/demo/test-job-output-si-207-F6DrciP3aTGyqd4Yk/.exabyte/rupy.log /cluster-001-home/demo/data/demo/test-job-output-si-207-F6DrciP3aTGyqd4Yk/F6DrciP3aTGyqd4Yk.json /cluster-001-home/demo/data/demo/test-job-output-si-207-F6DrciP3aTGyqd4Yk/job.log /cluster-001-home/demo/data/demo/test-job-output-si-207-F6DrciP3aTGyqd4Yk/pseudo/si_pbe_gbrv_1.0.upf /cluster-001-home/demo/data/demo/test-job-output-si-207-F6DrciP3aTGyqd4Yk/pw_scf.in /cluster-001-home/demo/data/demo/test-job-output-si-207-F6DrciP3aTGyqd4Yk/pw_scf.out
Get metadata for the Output File¶
The .out file is where Quantum Espresso shows its work and prints its results, so you most likely will want to view this files. Let's print out some of its metadata.
You'll find that we get a lot of data describing the file and its providence. Brief explanations of each entry are:
- Key - Path to the file on the cluster
- size - Size of the file, in bytes.
- Bucket - The name of the cluster which ran the job.
- Region - Which server region was used to run the job.
- Provider - The cluster provider for the compute resources (in our case, we used AWS).
- lastModified - Unix timestamp representing when the file was last modified.
- name - The filename.
- signedUrl - This is a link which can be used to download the file for a short amount of time.
for file in files:
if file["name"] == "pw_scf.out":
output_file_metadata = file
display_JSON(output_file_metadata)
Display file contents to console¶
The signedUrl gives us a place to access the file and download it. Let's read it into memory, and print out the last few lines of our job.
import urllib
server_response = urllib.request.urlopen(output_file_metadata["signedUrl"])
output_file_bytes = server_response.read()
# The server returns us a bytes-string. That's useful for things like binaries or other non-human-readable data, but this should be decoded if we're planning to write to console.
# Because this is a human-readable text file, we'll decode it to UTF-8.
output_file = output_file_bytes.decode(encoding="UTF-8")
# Tail the last 90 lines
lines = output_file.split("\n")
for line in lines[-90:]:
print(line)
-1.5106 -1.5106 3.4108 3.4108 6.9197 6.9197 16.1487 16.1487 the Fermi energy is 6.6081 ev ! total energy = -19.00890069 Ry Harris-Foulkes estimate = -19.00890026 Ry estimated scf accuracy < 0.00000057 Ry The total energy is the sum of the following terms: one-electron contribution = 5.04610005 Ry hartree contribution = 1.30263425 Ry xc contribution = -8.67765820 Ry ewald contribution = -16.67997678 Ry smearing contrib. (-TS) = -0.00000000 Ry convergence has been achieved in 6 iterations Forces acting on atoms (Ry/au): atom 1 type 1 force = -0.00000037 0.00000042 0.00000000 atom 2 type 1 force = 0.00000037 -0.00000042 0.00000000 Total force = 0.000001 Total SCF correction = 0.000002 SCF correction compared to forces is large: reduce conv_thr to get better values entering subroutine stress ... total stress (Ry/bohr**3) (kbar) P= 73.74 0.00050127 0.00000001 0.00000000 73.74 0.00 0.00 0.00000001 0.00050125 -0.00000000 0.00 73.74 -0.00 0.00000000 -0.00000000 0.00050127 0.00 -0.00 73.74 Writing output data file __prefix__.save init_run : 0.29s CPU 0.32s WALL ( 1 calls) electrons : 1.56s CPU 1.84s WALL ( 1 calls) forces : 0.09s CPU 0.11s WALL ( 1 calls) stress : 0.31s CPU 0.35s WALL ( 1 calls) Called by init_run: wfcinit : 0.05s CPU 0.05s WALL ( 1 calls) potinit : 0.03s CPU 0.04s WALL ( 1 calls) Called by electrons: c_bands : 0.87s CPU 0.88s WALL ( 6 calls) sum_band : 0.39s CPU 0.51s WALL ( 6 calls) v_of_rho : 0.16s CPU 0.17s WALL ( 7 calls) newd : 0.17s CPU 0.32s WALL ( 7 calls) mix_rho : 0.02s CPU 0.02s WALL ( 6 calls) Called by c_bands: init_us_2 : 0.02s CPU 0.02s WALL ( 90 calls) cegterg : 0.77s CPU 0.78s WALL ( 36 calls) Called by sum_band: sum_band:bec : 0.00s CPU 0.00s WALL ( 36 calls) addusdens : 0.21s CPU 0.33s WALL ( 6 calls) Called by *egterg: h_psi : 0.75s CPU 0.76s WALL ( 127 calls) s_psi : 0.01s CPU 0.01s WALL ( 127 calls) g_psi : 0.00s CPU 0.00s WALL ( 85 calls) cdiaghg : 0.01s CPU 0.01s WALL ( 121 calls) Called by h_psi: add_vuspsi : 0.01s CPU 0.01s WALL ( 127 calls) General routines calbec : 0.02s CPU 0.02s WALL ( 193 calls) fft : 0.13s CPU 0.13s WALL ( 133 calls) ffts : 0.01s CPU 0.01s WALL ( 13 calls) fftw : 0.78s CPU 0.79s WALL ( 1912 calls) interpolate : 0.02s CPU 0.02s WALL ( 13 calls) Parallel routines fft_scatter : 0.05s CPU 0.05s WALL ( 2058 calls) PWSCF : 2.30s CPU 3.19s WALL This run was terminated on: 15:19:31 28Jul2023 =------------------------------------------------------------------------------= JOB DONE. =------------------------------------------------------------------------------=
Save the input file and output file to disk.¶
Now that we've verified the job is done, let's go ahead and save it and its input to disk.
# We've already got an output file, so let's grab the input file we sent to Quantum Espresso
for file in files:
if "pw_scf.in" == file["name"]:
input_file_metadata = file
server_response = urllib.request.urlopen(input_file_metadata["signedUrl"])
input_file_bytes = server_response.read()
# Let's write the input file to disk. Note that we get files as a bytes string from the server, which is convenient for binaries, images, and other non-human-readable data.
# Although we could decode before writing to disk, we can just write it directly with the "wb" (write bytes) file mode.
with open(input_file_metadata["name"], "wb") as file_descriptor:
file_descriptor.write(input_file_bytes)
# Now, let's write our output file to the disk. Note that because we already decoded it, we can just use the 'w' file mode.
with open(output_file_metadata["name"], "w") as file_descriptor:
file_descriptor.write(output_file)