2021.10 Release Notes
askcos-site
- Update drawing endpoints to use new drawing functions (MR askcos-site!99)
- This updates both the API drawing endpoints and the legacy drawing views to use the rewritten drawing functions, and also adds consistent support for the transparency and svg options for all input types.
- Functional improvements to tree builder (MR askcos-site!100)
- Display more detailed statistics for tree builder jobs including timings
- Add support for reaction classification
- Enable alternative termination options for v1 tree builder
- Add support for sorting trees by starting material cost
- Update fingerprint generation in TFXFastFilter (MR askcos-site!101)
- Following changes to the fingerprinting module in askcos-core!61.
- Bug fixes for IPP saved results and template view (MR askcos-site!102)
- Extract target SMILES
- Update template example counts
- Refresh computed properties (for cluster view)
- Fix reaction display for custom template sets (Issue askcos-site#64)
- Add support for retrosynthesis planning using multiple template sets (MR askcos-site!103)
- Updates to the retro and tree-builder API endpoints to support a new
template_prioritizers
argument, which accepts a list of config objects includingtemplate_set
,version
, andattribute_filter
(retro only) fields - For one-step retro predictions, multi-prioritizer predictions are implemented at the celery task level, where individual predictions are performed in parallel by separate tasks, and the results are then combined and re-ranked appropriately by a final task
- Frontend changes include updates to the settings modal to enable configuring multiple template prioritizers, as well as the ability to combine results from separate predictions, i.e. "re-expanding" a node using different settings
- Updates to the retro and tree-builder API endpoints to support a new
- Improvements to tree builder result view and analysis (MR askcos-site!104)
- Adds ability to perform pathway ranking and reaction classification for completed jobs, along with new API endpoint for tree builder result analysis tasks
- Adds filtering options to the result interface, currently including filters for starting materials, intermediates, and reaction classes (if reaction classification was done)
- Adds new sorting option for starting material cost
- Improves rounding method for displaying scores to use significant figures instead of fixed decimals
- Add support for reaction network optimization (MR askcos-site!105)
- Adds API endpoint and celery worker for reaction network optimization
- Adds option to find optimal path on tree builder results page
- Update tutorial and documentation (MR askcos-site!106)
- Revises tutorial page with up-to-date information on current ASKCOS features
- Adds new Quick Reference button to navbar which brings up a glossary of definitions
- Updates API documentation and test cases
- Fixes to number formatting in tree view
- Fixes to atom mapping visualization on homepage
askcos-core
- Rewrite drawing module and update dependencies (MR askcos-core!58, Issue ASKCOS#295)
- Almost complete rewrite of the drawing module and also updates dependency versions, most notably RDKit
- Standardizes drawing functions to use RDKit's new drawing interface and adds consistent support for various options:
- Transparency (default on)
- Format (PNG/SVG/PIL.Image)
- Atom label coloring by element (default off)
- Fix accidental downgrade of pyyaml version (MR askcos-core!59)
- Introduced by MR askcos-core!58
- Fix softmax application for template relevance model (MR askcos-core!60)
- Fixes a major bug with the newer set of template relevance models where softmax activation was being applied twice (once by the model and once during post-processing), leading to scores on the order of 1e-5.
- Clean up and centralize fingerprint generation (MR askcos-core!61)
- Create new fingerprint generation utility functions in the existing
askcos.utilities.fingerprinting
module, and replace duplicated fingerprint generation code in various locations
- Create new fingerprint generation utility functions in the existing
- Functional improvements to tree builder (MR askcos-core!62)
- Add alternative stopping criteria to tree builder v1 (max iterations, chemicals, reactions, templates)
- Return more detailed statistics after tree builder job, including some timings
- Add support for classifying reactions following a tree builder job
- Major refactoring and cleanup of reaction classification code
- Fix bugs in updated drawing code (MR askcos-core!63)
- Introduced by MR askcos-core!58
- Prevent RDKit from preparing molecule again, so that options like kekulize are not ignored
- Change default to not kekulize template smarts when drawing
- Fail silently if atom highlights cannot be determined when drawing molecules
- Do not error if drawing an incomplete reaction (without reactants or products)
- Add support for retrosynthesis planning using multiple template sets (MR askcos-core!64)
- Adds support for handling multiple template sets to both tree builders, RetroTransformer, and ChemHistorian
- While the RetroTransformer has been updated to support loading multiple template sets, use of multiple template sets is still limited to the
apply_one_template_by_idx
function in order to support the tree builders - Both the RetroTransformer and ChemHistorian can now load data from multiple files (each corresponding to a template set) if they are properly defined in global_config.py
- A notable configuration change is that 'template_set' is no longer an input option. Instead, the 'template_prioritizers' option implicitly determines the template set to use (also set in global_config.py), based on the assumption that a given template prioritization model is trained on a specific template set. This also means that the template relevance model is now the only supported template prioritizer (which has been the de facto case for a while)
- As a result, this introduces backwards incompatible changes to the signatures for the RetroTransformer and MCTS classes, where the
template_set
andtemplate_prioritizer
arguments have been replaced bytemplate_prioritizers
- Catch TFFP prediction errors and return empty result (MR askcos-core!65, Issue askcos-site#66)
- Replace numpy calculations in EHS score code (MR askcos-core!66)
- Introduced by MR askcos-core!61
- Add reaction network optimization module (MR askcos-core!67)
- For evaluating reaction conditions for a tree builder generated reaction network and finding the pathway(s) with least material requirements
- Based on https://doi.org/10.1021/acs.jcim.0c01032
- Do not delete ppg if re-cleaning tree builder json (MR askcos-core!68)
askcos-data
- Update askcos-base image tag (MR askcos-data!24)
- Use new askcos-base version with updated dependencies.
- Add new BKMS and Pistachio ring-breaker models (MRs askcos-data!25, askcos-data!26, askcos-data!27)
- BKMS: Includes all chemical and reaction data, as well as links to the BKMS website
- Pistachio ring-breaker: Only includes new template set, shares chemical and reaction data with existing Pistachio model (Issue ASKCOS#343)
askcos-deploy
- Upload Helm chart releases to GitLab Helm chart repo (MR askcos-deploy!81)
- The existing upload to the GitLab docker registry uses Helm's experimental support for OCI-based registries. While the charts uploaded there are identical, the access method is different and may not be supported by various Kubernetes management platforms. Both /uploads will be maintained for now.
- Add normalized attribute to template relevance model config (MR askcos-deploy!82)
- Indicates whether or not output is already normalized (see MR askcos-core!60)
- Add support for new BKMS and Pistachio ring-breaker models (MR askcos-deploy!83)
- Also updates behavior of
deploy.sh seed-db
to append data by default instead of dropping, which is much safer in case the--append
argument was forgotten. Existing data can be dropped by explicitly specifying--drop
.
- Also updates behavior of
- Start bkms and ringbreaker servers on deploy (MR askcos-deploy!84)
- Add graph optimizer celery workers to deployment (MR askcos-deploy!85)
Docker Compose Deployment
We currently support two methods for deploying ASKCOS: Docker Compose and Kubernetes. Docker Compose is a simpler method for deploying on a single workstation, while Kubernetes is more complex but is suitable for scaling across multiple nodes.
Hardware Requirements
To deploy ASKCOS with the default number of workers, we recommend using a server with at least 16 CPU cores and 64 GB memory. The default configuration uses approximately 45 GB memory at deploy, but usage will increase while running some compute tasks. ASKCOS is not currently set up to use GPUs for machine learning predictions.
For deployment on AWS, this corresponds to an m5.4xlarge instance or similar. (Note that ASKCOS does not work on ARM-based instances.)
For deployment on Google Cloud, this corresponds to an e2-standard-16 instance or similar.
If you plan to increase worker scales, you should increase hardware resources accordingly.
Finally, you should provision at least 40 GB of drive space for a basic deployment. More disk space is recommended for long-term deployments to store user data and support updates and custom models and data.
Software Prerequisites
To deploy ASKCOS using Docker Compose, you must have the following installed on your machine:
- git
- Docker (installation instructions)
- Docker Compose (installation instructions)
Quickstart
ASKCOS can be downloaded using deploy tokens, which provide read-only access to the source code and our container registry in GitLab. Below is a complete example showing how to deploy the ASKCOS application using deploy tokens (omitted in this example). The deploy tokens can be found on the MLPDS Member Resources ASKCOS Versions Page.
$ export DEPLOY_TOKEN_USERNAME=
$ export DEPLOY_TOKEN_PASSWORD=
$ docker login registry.gitlab.com -u $DEPLOY_TOKEN_USERNAME -p $DEPLOY_TOKEN_PASSWORD
$ git clone https://$DEPLOY_TOKEN_USERNAME:$DEPLOY_TOKEN_PASSWORD@gitlab.com/mlpds_mit/askcos/askcos-deploy.git
$ cd askcos-deploy
$ git checkout 2021.10
$ bash deploy.sh deploy
Upgrade Information
The askcos-deploy repository also provides scripts to upgrade an existing ASKCOS deployment in-place.
$ git checkout 2021.10
$ bash deploy.sh update -v 2021.10
Some releases include changes or additions which require further action. Depending on the version you are upgrading from, you may need to perform one or more of the following steps.
New in 2021.10
The 2021.10 release includes two new template relevance models: one trained on enzymatic reactions from BKMS, and one trained on ring-breaking reactions from Pistachio. In order to use the new models, the associated data will need to be imported into MongoDB:
$ bash deploy.sh seed-db -r bkms -c bkms -x bkms
$ bash deploy.sh seed-db -r ringbreaker
Note that the deploy.sh
script has been updated to change the default behavior for seed-db
to append new data. To drop all existing data (the previous behavior), you can pass the --drop
argument.
Notes from earlier releases
The 2021.07 release includes a new template relevance model trained on a SciFinder/CAS template set. In order to use the new model, you will need to import the reaction templates into MongoDB:
$ bash deploy.sh seed-db -r cas --append
!>Please note that chemical historian data is not included for the CAS model at this time, so chemical popularity information will not be available for tree builder jobs using the CAS model.
The 2021.07 also introduces a model serving configuration file, located at askcos-deploy/model_config.yaml
. Models deployed using Tensorflow Serving or Torchserve must be added to the configuration file to provide connection parameters to ASKCOS. Any existing custom models should be added to ensure they work after updating.
The 2021.04 release includes new reference data for the Pistachio template relevance model and the aromatic site selectivity model. The reference data must be imported into MongoDB to take advantage of the new features:
$ bash deploy.sh seed-db -e default -x default
The 2020.10 release included a new Pistachio template set and template relevance model. If you have not already done so, you will need to seed some new data into the mongo database:
$ bash deploy.sh seed-db -c pistachio -r pistachio --append
The 2020.10 release also included an updated set of default buyables data. If you have not already done so, you can import the new data using the following command:
$ bash deploy.sh seed-db -b default --append
Note that this will result in some duplicate data. If you have not added custom buyables data, you can drop the existing buyables database and import the updated data by omitting the --append
argument.
!>In some cases, we have seen issues with resetting rabbitmq data while upgrading from 2020.07 to 2020.10. If you see celery workers restarting after updating and inequivalent arg 'x-max-priority'
errors in worker logs, you should restart rabbitmq again using docker-compose rm -fsv rabbit && docker-compose up -d rabbit
.
The 2020.07 release introduced new index types which significantly improved database lookup performance. If you have not already done so, you should re-index the database using the following command:
$ bash deploy.sh index-db --drop-indexes
First Time Deployment
Deploying the Web Application
Deployment is initiated by a bash script that runs a few docker-compose commands in a specific order. Several database services need to be started first, and more importantly seeded with data, before other services (which rely on the availability of data in the database) can start. The deploy.sh
script is provided in the askcos-deploy repository and should be run as follows:
$ bash deploy.sh command [optional arguments]
For a full list of available commands and options, use the help
command.
There are a number of available commands for common deploy tasks:
deploy
: runs standard first-time deployment tasks, includingseed-db
update
: pulls new docker image from GitLab repository and restarts all servicesseed-db
: seed the database with default or custom data filesstart
: start a deployment without performing first-time tasksstop
: stop a running deploymentclean
: stop a running deployment and remove all docker containers and volumes
For a running deployment, new data can be seeded into the database using the seed-db
command along with arguments indicating the types of data to be seeded. Note that this will replace the existing data in the database. The available arguments are as follows:
-b, --buyables
: specify buyables data to seed, eitherdefault
or path to data file-c, --chemicals
: specify chemicals data to seed, eitherdefault
or path to data file-x, --reactions
: specify reactions data to seed, eitherdefault
or path to data file-r, --retro-templates
: specify retrosynthetic templates to seed, eitherdefault
or path to data file-f, --forward-templates
: specify forward templates to seed, eitherdefault
or path to data file-e, --references
: specify model reference data to seed, only supportsdefault
currently
For example, to seed default buyables data and custom retrosynthetic pathways, run the following from the deploy folder:
$ bash deploy.sh seed-db --buyables default --retro-templates /path/to/my.retro.templates.json.gz
To update a deployment, run the following from the deploy folder:
$ bash deploy.sh update --version x.y.z
To stop a currently running application, run the following from the deploy folder:
$ bash deploy.sh stop
If you would like to clean up and remove everything from a previous deployment (NOTE: you will lose user data), run the following from the deploy folder:
$ bash deploy.sh clean
Backing Up User Data
If you are upgrading from v0.3.1 or later, the backup/restore process is no longer needed unless you are moving deployments to a new machine.
New backup and restore functions were added in askcos-deploy 2020.07 to provide more robust backup/restore capabilities based on Docker volumes. The commands can be used whether the site is running or not; the only requirement is that the mongo_data
and mysql_data
Docker volumes exist.
To backup:
bash deploy.sh backup [-d /absolute/path/to/backup/dir]
To restore:
bash deploy.sh restore [-d /absolute/path/to/backup/dir]
!>Note: These backup and restore processes are run in a bare alpine linux image which will be automatically pulled by Docker.
Add Customization
There are a few parts of the application that you can customize:
- Header sub-title next to ASKCOS (to designate this as a local deployment at your organization)
- Email addresses for the support form
- Whether to enable the chemical name to SMILES resolver
- Whether authorization is required to modify the buyables database
- Add internal URL to a Pistachio web app deployment to enable direct links
These are handled as an environment variables that can change upon deployment (and are therefore not tied into the image directly). This can be found in the customization
file, which is created automatically during deployment from the customization.example
file.
In addition, the following methods enable more substantial customizations to the ASKCOS website without rebuilding the askcos-site image:
- Customization of Django site settings
- Include customizations in the
askcos-deploy/custom_django_settings.py
file which is mounted to/usr/local/askcos-site/askcos_site/custom_settings.py
in the app container
- Include customizations in the
- Customization of web frontend
- Include custom script or css tags in a
custom_head.html
Django template file which is mounted to/usr/local/askcos-site/askcos_site/templates/custom_head.html
and included in the<head>
section of every page
- Include custom script or css tags in a
Please let us know what other degrees of customization you would like.
Managing Django
If you'd like to manage the Django app (i.e. - run python manage.py ...), for example, to create an admin superuser, you can run commands in the running app service as follows:
$ docker-compose exec app bash -c "python /usr/local/askcos-site/manage.py createsuperuser"
In this case you'll be presented an interactive prompt to create a superuser with your desired credentials.
Scaling Workers
Only 1 worker per queue is deployed by default with limited concurrency. This is not ideal for many-user demand. The scaling of each worker is defined at the top of the deploy.sh
script. To scale a desired worker, change the appropriate value in deploy.sh
, for example:
n_tb_c_worker=N # Tree builder chiral worker
where N is the number of workers you want. Then run bash deploy.sh start [-v <version>]
.
Kubernetes Deployment
ASKCOS 2021.10 includes a Helm chart to make it easier to deploy ASKCOS on Kubernetes. The previous Kubernetes configuration can still be used for 2020.07 or earlier but will no longer be updated.
Hardware Requirements
To deploy ASKCOS with the default number of workers, we recommend using a server with at least 16 CPU cores and 64 GB memory combined across nodes, and individual nodes with at least 16 GB memory. The default configuration uses approximately 45 GB memory total at deploy, with the most resource intensive worker needing about 14 GB, but usage will increase while running some compute tasks. ASKCOS is not currently set up to use GPUs for machine learning predictions.
For deployment on AWS, this corresponds to one m5.4xlarge instance or two m5.2xlarge instances. (Note that ASKCOS does not work on ARM-based instances.)
For deployment on Google Cloud, this corresponds to an e2-standard-16 instance or two e2-standard-8 instances.
If you plan to increase worker scales, you should increase hardware resources accordingly.
Software Prerequisites
In addition to git and Docker, we will assume that you are using a cluster which already has Kubernetes configured. You will also need to install Helm 3: https://helm.sh/docs/intro/install/.
Quickstart
Similar to the Docker Compose deployment, you will need to obtain the ASKCOS deploy tokens in order to clone the askcos-deploy repository and access the GitLab image registry. The deploy tokens can be found on the MLPDS Member Resources ASKCOS Versions Page.
$ export DEPLOY_TOKEN_USERNAME=
$ export DEPLOY_TOKEN_PASSWORD=
$ git clone https://$DEPLOY_TOKEN_USERNAME:$DEPLOY_TOKEN_PASSWORD@gitlab.com/mlpds_mit/askcos/askcos-deploy.git
$ cd askcos-deploy
$ git checkout 2021.10
$ helm install --set imageCredentials.username=$DEPLOY_TOKEN_USERNAME --set imageCredentials.password=$$DEPLOY_TOKEN_PASSWORD mydeploy ./helm/askcos
For more configuration options, please check out the values file at askcos-deploy/helm/askcos/values.yaml
.
Add Customization
For Kubernetes, the same customizations can be applied as for the Docker Compose deployment:
- Header sub-title next to ASKCOS (to designate this as a local deployment at your organization)
- Email addresses for the support form
- Whether to enable the chemical name to SMILES resolver
- Whether authorization is required to modify the buyables database.
The environment variables for these customizations can be adjusted in the env
block of the values.yaml
file.
Managing Django
If you'd like to manage the Django app (i.e. - run python manage.py ...), for example, to create an admin superuser, you can run commands in the running app container as follows:
$ kubectl exec [ASKCOS POD] -c app -i -t -- python /usr/local/askcos-site/manage.py createsuperuser
In this case you'll be presented an interactive prompt to create a superuser with your desired credentials.
Scaling Workers
For Kubernetes, worker replicas can also be set in the values.yaml
file. Celery workers are defined in the celery
block as a list, and each item has a replicaCount
field for for setting the number of replicas.
(Optional) Building Docker Images
If you would like to build the askcos-site Docker image yourself, you will need to download the appropriate repositories depending on where you want to start.
To only build askcos-site using a pre-built askcos-core image:
$ git clone https://gitlab.com/mlpds_mit/askcos/askcos-site
$ cd askcos-site
$ make [TAG=my_tag]
A Makefile is provided to make it easier to build the image with a default image name. You can also use the docker build
command directly:
$ docker build -t <image name>:<tag> .
!>Note: The image name should correspond with what exists in the docker-compose.yml
file. By default, the image name is environment variable ASKCOS_IMAGE_REGISTRY
+ askcos-site
. If you choose to use a custom image name, make sure to modify the ASKCOS_IMAGE_REGISTRY
variable or the docker-compose.yml
file accordingly. For Kubernetes deployment, the image registry and tag are defined in the values.yaml
file.
Similarly, if you also want to build askcos-core:
$ git clone https://gitlab.com/mlpds_mit/askcos/askcos-core
$ cd askcos-core
$ make [TAG=my_tag]
Note that you will need to specify the appropriate askcos-core version when building askcos-site afterwards:
$ cd askcos-core
$ make TAG=my_tag
$ cd ../askcos-site
$ make CORE_VERSION=my_tag TAG=my_tag
ASKCOS Development
Software package for the prediction of feasible synthetic routes towards a desired compound and associated tasks related to synthesis planning. Originally developed under the DARPA Make-It program and now being developed under the MLPDS Consortium.