Skip to content

2021.07 Release Notes

askcos-site

User notes:

  • Tree builder jobs now report "started" state (MR askcos-site!80)
  • Report length of pending tasks queue on server status page (MR askcos-site!80)
  • Improve tooltips, input fields, and other interface elements across site (MR askcos-site!82)
  • Add ability to view tree builder job settings from results page (MR askcos-site!82)
  • Added confirmation prompt before running new one-step prediction in IPP with existing results (MR askcos-site!87)
  • Add support for ML serving model configuration file (Issues askcos-site#5, askcos-site#17, MRs askcos-site!86, askcos-site!95, Epic &30)
  • Improve tag input field for saving IPP results (MR askcos-site!91)
  • Display saved result info when viewing in IPP and tree results page (Issues askcos-site#43, askcos-site#62, MR askcos-site!91)
  • Show database status on server status page (Issue ASKCOS#243, MR askcos-site!91)
  • Add ability to resubmit failed tree builder jobs (Issue ASKCOS#208, MR askcos-site!91)
  • Add support for interfacing with openretro (Issue askcos-site#52, MR askcos-site!94, Epic &10)
  • Add new retrosynthesis page for comparing results from different models (Issue askcos-site#53, MR askcos-site!92, Epic &10)

Developer notes:

  • Saved result documents are created at submission instead of completion (MR askcos-site!80) [Note 1]
  • Relocate wsgi.py to location specified in Django settings (MR askcos-site!80)
  • Migrate to "standalone" vue application (MR askcos-site!82)
  • Switch from bootstrap.js/jquery to bootstrap-vue (MR askcos-site!82)
  • Switch from bootstrap-tourist.js to shepard.js for interactive tours (MR askcos-site!82)
  • Enable ManifestStaticFilesStorage for better browser cache management (MR askcos-site!82)
  • Add support for including custom code in webpage head block (Issues askcos-site#26, askcos-site#59, MR askcos-site!91)
  • Migrate some IPP data to vuex to enable sharing data between components (MR askcos-site!90)
  • Use askcos-site specific version of askcos-data image (MR askcos-site!93)

Bug fixes:

  • Fix broken delete button in results page and hide share and delete buttons for pending results (MR askcos-site!80) [Note 1]
  • Fix add precursors button by checking if precursor has templates before accessing (Issue ASKCOS#347, MR askcos-site!81) [Note 1]
  • Improve error handling when saving tree builder results (MR askcos-site!83) [Note 1]
  • Fix mouse translation issue with Ketcher modal (Issues askcos-site#7, askcos-site#39, MR askcos-site!82)
  • Fix tree builder version select input type coercion in IPP (MR askcos-site!85)
  • Fix some user interface bugs in IPP and site selectivity module (MR askcos-site!87)
  • Fix bugs with recommended templates modal and tree builder import (MR askcos-site!88)
  • Increase footer z-index (MR askcos-site!89)
  • Fix confirmation before clearing IPP results (MR askcos-site!91)

Deprecation:

  • Viewing of saved HTML results is deprecated and will be removed in a future release. Such results can be identified from the results page as having a type of html, or from the SavedResults table in the MySQL database as having a result_type of html. Please contact us questions or suggestions for managing this change.
  • In the retro prediction API endpoint, the template_set parameter has been superceded by training_set, and the template_set_version parameter has been superceded by model_version. The old parameter names will continue to work but will be removed in a future release.

[Note 1] These fixes were also included in the 2021.04.1 patch release of askcos-site.

askcos-core

  • Adjust custom Celery task state reporting for impurity predictor (MR askcos-core!50)
  • Modularize RelevanceTemplatePrioritizer.predict to reduce code duplication (MR askcos-core!51)
  • Fix clustering error when no precursors were predicted (Issue askcos-core#8, MR askcos-core!52)
  • Do not display C-H site selectivity scores for atoms without a C-H bond (Issue askcos-core#9, MR askcos-core!54)
  • Refactor Dockerfile and Makefile to pass labels to docker build (MR askcos-core!55)
  • Specify template set when looking up target in chemhistorian (MR askcos-core!56)

askcos-data

  • Update Docker build to create askcos-site specific image (MR askcos-data!19, askcos-data!20, askcos-data!21)
  • Add SciFinder/CAS template set and template relevance model (MR askcos-data!22)

askcos-deploy

  • Update django container command for relocated wsgi file (MR askcos-deploy!72)
  • Fix collection dropping issue when seeding custom buyables (MR askcos-deploy!73)
  • Add docker-compose development configuration (MR askcos-deploy!74)
  • Add model_config.yaml and deploy support (MRs askcos-deploy!75, askcos-deploy!77)
  • Update dependency Helm charts and changed template values (MR askcos-deploy!76)
  • Add deploy support for CAS template relevance model(MR askcos-deploy!78)

Docker Compose Deployment

We currently support two methods for deploying ASKCOS: Docker Compose and Kubernetes. Docker Compose is a simpler method for deploying on a single workstation, while Kubernetes is more complex but is suitable for scaling across multiple nodes.

Hardware Requirements

To deploy ASKCOS with the default number of workers, we recommend using a server with at least 16 CPU cores and 64 GB memory. The default configuration uses approximately 45 GB memory at deploy, but usage will increase while running some compute tasks. ASKCOS is not currently set up to use GPUs for machine learning predictions.

For deployment on AWS, this corresponds to an m5.4xlarge instance or similar. (Note that ASKCOS does not work on ARM-based instances.)

For deployment on Google Cloud, this corresponds to an e2-standard-16 instance or similar.

If you plan to increase worker scales, you should increase hardware resources accordingly.

Finally, you should provision at least 40 GB of drive space for a basic deployment. More disk space is recommended for long-term deployments to store user data and support updates and custom models and data.

Software Prerequisites

To deploy ASKCOS using Docker Compose, you must have the following installed on your machine:

Quickstart

ASKCOS can be downloaded using deploy tokens, which provide read-only access to the source code and our container registry in GitLab. Below is a complete example showing how to deploy the ASKCOS application using deploy tokens (omitted in this example). The deploy tokens can be found on the MLPDS Member Resources ASKCOS Versions Page.

bash
$ export DEPLOY_TOKEN_USERNAME=
$ export DEPLOY_TOKEN_PASSWORD=
$ docker login registry.gitlab.com -u $DEPLOY_TOKEN_USERNAME -p $DEPLOY_TOKEN_PASSWORD
$ git clone https://$DEPLOY_TOKEN_USERNAME:$DEPLOY_TOKEN_PASSWORD@gitlab.com/mlpds_mit/askcos/askcos-deploy.git
$ cd askcos-deploy
$ git checkout 2021.07
$ bash deploy.sh deploy

Upgrade Information

The askcos-deploy repository also provides scripts to upgrade an existing ASKCOS deployment in-place.

From 2020.04 or newer

bash
$ git checkout 2021.07
$ bash deploy.sh update -v 2021.07

Some releases include changes or additions which require further action. Depending on the version you are upgrading from, you may need to perform one or more of the following steps.

New in 2021.07

The 2021.07 release includes a new template relevance model trained on a SciFinder/CAS template set. In order to use the new model, you will need to import the reaction templates into MongoDB:

bash
$ bash deploy.sh seed-db -r cas --append

!>Please note that chemical historian data is not included for the CAS model at this time, so chemical popularity information will not be available for tree builder jobs using the CAS model.

The 2021.07 also introduces a model serving configuration file, located at askcos-deploy/model_config.yaml. Models deployed using Tensorflow Serving or Torchserve must be added to the configuration file to provide connection parameters to ASKCOS. Any existing custom models should be added to ensure they work after updating.

Notes from earlier releases

The 2021.04 release includes new reference data for the Pistachio template relevance model and the aromatic site selectivity model. The reference data must be imported into MongoDB to take advantage of the new features:

bash
$ bash deploy.sh seed-db -e default -x default

The 2020.10 release included a new Pistachio template set and template relevance model. If you have not already done so, you will need to seed some new data into the mongo database:

bash
$ bash deploy.sh seed-db -c pistachio -r pistachio --append

The 2020.10 release also included an updated set of default buyables data. If you have not already done so, you can import the new data using the following command:

bash
$ bash deploy.sh seed-db -b default --append

Note that this will result in some duplicate data. If you have not added custom buyables data, you can drop the existing buyables database and import the updated data by omitting the --append argument.

!>In some cases, we have seen issues with resetting rabbitmq data while upgrading from 2020.07 to 2020.10. If you see celery workers restarting after updating and inequivalent arg 'x-max-priority' errors in worker logs, you should restart rabbitmq again using docker-compose rm -fsv rabbit && docker-compose up -d rabbit.

The 2020.07 release introduced new index types which significantly improved database lookup performance. If you have not already done so, you should re-index the database using the following command:

bash
$ bash deploy.sh index-db --drop-indexes

From versions prior to v0.4.1

Upgrading from earlier versions of ASKCOS directly to 2021.07 has not been thoroughly tested. Instead, we suggest upgrading to v0.4.1 as an intermediate step.

bash
$ git checkout v0.4.1
$ bash backup.sh
$ bash deploy.sh update -v 0.4.1
$ bash deploy.sh set-db-defaults seed-db
$ bash restore.sh

After upgrading to v0.4.1, you should follow the above instructions to upgrade to 2021.07.

!>Note: A large amount of data was migrated to the mongodb in v0.4.1 (chemhistorian), and seeding may take some time to complete. We send this seeding task to the background so the rest of the application can start and become functional without having to wait. If using the default set of data (i.e. - using the exact commands above), you can monitor the progress of mongodb seeding using bash deploy.sh count-mongo-docs, which will tell you how many documents have been seeded out of the expected number. Complete seeding is not necessary for application functionality unless you use the chemical popularity logic in the tree builder.

First Time Deployment

Deploying the Web Application

Deployment is initiated by a bash script that runs a few docker-compose commands in a specific order. Several database services need to be started first, and more importantly seeded with data, before other services (which rely on the availability of data in the database) can start. The deploy.sh script is provided in the askcos-deploy repository and should be run as follows:

bash
$ bash deploy.sh command [optional arguments]

For a full list of available commands and options, use the help command.

There are a number of available commands for common deploy tasks:

  • deploy: runs standard first-time deployment tasks, including seed-db
  • update: pulls new docker image from GitLab repository and restarts all services
  • seed-db: seed the database with default or custom data files
  • start: start a deployment without performing first-time tasks
  • stop: stop a running deployment
  • clean: stop a running deployment and remove all docker containers and volumes

For a running deployment, new data can be seeded into the database using the seed-db command along with arguments indicating the types of data to be seeded. Note that this will replace the existing data in the database. The available arguments are as follows:

  • -b, --buyables: specify buyables data to seed, either default or path to data file
  • -c, --chemicals: specify chemicals data to seed, either default or path to data file
  • -x, --reactions: specify reactions data to seed, either default or path to data file
  • -r, --retro-templates: specify retrosynthetic templates to seed, either default or path to data file
  • -f, --forward-templates: specify forward templates to seed, either default or path to data file
  • -e, --references: specify model reference data to seed, only supports default currently

For example, to seed default buyables data and custom retrosynthetic pathways, run the following from the deploy folder:

bash
$ bash deploy.sh seed-db --buyables default --retro-templates /path/to/my.retro.templates.json.gz

To update a deployment, run the following from the deploy folder:

bash
$ bash deploy.sh update --version x.y.z

To stop a currently running application, run the following from the deploy folder:

bash
$ bash deploy.sh stop

If you would like to clean up and remove everything from a previous deployment (NOTE: you will lose user data), run the following from the deploy folder:

bash
$ bash deploy.sh clean

Backing Up User Data

From v0.3.1 or above

If you are upgrading from v0.3.1 or later, the backup/restore process is no longer needed unless you are moving deployments to a new machine.

New backup and restore functions were added in askcos-deploy 2020.07 to provide more robust backup/restore capabilities based on Docker volumes. The commands can be used whether the site is running or not; the only requirement is that the mongo_data and mysql_data Docker volumes exist.

To backup:

bash
bash deploy.sh backup [-d /absolute/path/to/backup/dir]

To restore:

bash
bash deploy.sh restore [-d /absolute/path/to/backup/dir]

!>Note: These backup and restore processes are run in a bare alpine linux image which will be automatically pulled by Docker.

From v0.2.x or v0.3.0

If you are upgrading the deployment from a previous version (prior to v0.3.1), or moving the application to a different server, you may want to retain user accounts and user-saved data/results. The provided backup.sh and restore.sh scripts in the askcos-deploy/utils/legacy directory are capable of handling the backup and restoring process. Please read the following carefully so as to not lose any user data:

  1. Start by making sure the previous version you would like to backup is currently up and running with docker-compose ps.
  2. Checkout the newest version of the askcos-deploy: git checkout 2021.07
  3. Run $ bash utils/legacy/backup.sh
  4. Make sure that the deploy/backup folder is present, and there is a folder with a long string of numbers (year+month+date+time) that corresponds to the time you just ran the backup command
  5. If the backup was successful (db.json and user_saves (<v0.3.1) or results.mongo (>=0.3.1) should be present), you can safely tear down the old application with docker-compose down [-v]
  6. Deploy the new application with bash deploy.sh deploy or update with bash deploy.sh update -v x.y.z
  7. Restore user data with bash utils/legacy/restore.sh

!>Note: For versions >=0.3.1, user data persists in docker volumes and is not tied to the lifecycle of the container services. If the [-v] flag is not used with docker-compose down, volumes do not get removed, and user data is safe. In this case, the backup/restore procedure is not necessary as the containers that get created upon an install/upgrade will continue to use the docker volumes that contain all the important data. If the [-v] flag is used, all data will be removed and a restore will be required to recover user data.

Add Customization

There are a few parts of the application that you can customize:

  • Header sub-title next to ASKCOS (to designate this as a local deployment at your organization)
  • Email addresses for the support form
  • Whether to enable the chemical name to SMILES resolver
  • Whether authorization is required to modify the buyables database
  • Add internal URL to a Pistachio web app deployment to enable direct links

These are handled as an environment variables that can change upon deployment (and are therefore not tied into the image directly). This can be found in the customization file, which is created automatically during deployment from the customization.example file.

In addition, the following methods enable more substantial customizations to the ASKCOS website without rebuilding the askcos-site image:

  • Customization of Django site settings
    • Include customizations in the askcos-deploy/custom_django_settings.py file which is mounted to /usr/local/askcos-site/askcos_site/custom_settings.py in the app container
  • Customization of web frontend
    • Include custom script or css tags in a custom_head.html Django template file which is mounted to /usr/local/askcos-site/askcos_site/templates/custom_head.html and included in the <head> section of every page

Please let us know what other degrees of customization you would like.

Managing Django

If you'd like to manage the Django app (i.e. - run python manage.py ...), for example, to create an admin superuser, you can run commands in the running app service as follows:

bash
$ docker-compose exec app bash -c "python /usr/local/askcos-site/manage.py createsuperuser"

In this case you'll be presented an interactive prompt to create a superuser with your desired credentials.

Scaling Workers

Only 1 worker per queue is deployed by default with limited concurrency. This is not ideal for many-user demand. The scaling of each worker is defined at the top of the deploy.sh script. To scale a desired worker, change the appropriate value in deploy.sh, for example:

n_tb_c_worker=N          # Tree builder chiral worker

where N is the number of workers you want. Then run bash deploy.sh start [-v <version>].

Kubernetes Deployment

ASKCOS 2021.07 includes a Helm chart to make it easier to deploy ASKCOS on Kubernetes. The previous Kubernetes configuration can still be used for 2020.07 or earlier but will no longer be updated.

Hardware Requirements

To deploy ASKCOS with the default number of workers, we recommend using a server with at least 16 CPU cores and 64 GB memory combined across nodes, and individual nodes with at least 16 GB memory. The default configuration uses approximately 45 GB memory total at deploy, with the most resource intensive worker needing about 14 GB, but usage will increase while running some compute tasks. ASKCOS is not currently set up to use GPUs for machine learning predictions.

For deployment on AWS, this corresponds to one m5.4xlarge instance or two m5.2xlarge instances. (Note that ASKCOS does not work on ARM-based instances.)

For deployment on Google Cloud, this corresponds to an e2-standard-16 instance or two e2-standard-8 instances.

If you plan to increase worker scales, you should increase hardware resources accordingly.

Software Prerequisites

In addition to git and Docker, we will assume that you are using a cluster which already has Kubernetes configured. You will also need to install Helm 3: https://helm.sh/docs/intro/install/.

Quickstart

Similar to the Docker Compose deployment, you will need to obtain the ASKCOS deploy tokens in order to clone the askcos-deploy repository and access the GitLab image registry. The deploy tokens can be found on the MLPDS Member Resources ASKCOS Versions Page.

bash
$ export DEPLOY_TOKEN_USERNAME=
$ export DEPLOY_TOKEN_PASSWORD=
$ git clone https://$DEPLOY_TOKEN_USERNAME:$DEPLOY_TOKEN_PASSWORD@gitlab.com/mlpds_mit/askcos/askcos-deploy.git
$ cd askcos-deploy
$ git checkout 2021.07
$ helm install --set imageCredentials.username=$DEPLOY_TOKEN_USERNAME --set imageCredentials.password=$$DEPLOY_TOKEN_PASSWORD mydeploy ./helm/askcos

For more configuration options, please check out the values file at askcos-deploy/helm/askcos/values.yaml.

Add Customization

For Kubernetes, the same customizations can be applied as for the Docker Compose deployment:

  • Header sub-title next to ASKCOS (to designate this as a local deployment at your organization)
  • Email addresses for the support form
  • Whether to enable the chemical name to SMILES resolver
  • Whether authorization is required to modify the buyables database.

The environment variables for these customizations can be adjusted in the env block of the values.yaml file.

Managing Django

If you'd like to manage the Django app (i.e. - run python manage.py ...), for example, to create an admin superuser, you can run commands in the running app container as follows:

bash
$ kubectl exec [ASKCOS POD] -c app -i -t -- python /usr/local/askcos-site/manage.py createsuperuser

In this case you'll be presented an interactive prompt to create a superuser with your desired credentials.

Scaling Workers

For Kubernetes, worker replicas can also be set in the values.yaml file. Celery workers are defined in the celery block as a list, and each item has a replicaCount field for for setting the number of replicas.

(Optional) Building Docker Images

If you would like to build the askcos-site Docker image yourself, you will need to download the appropriate repositories depending on where you want to start.

To only build askcos-site using a pre-built askcos-core image:

bash
$ git clone https://gitlab.com/mlpds_mit/askcos/askcos-site
$ cd askcos-site
$ make [TAG=my_tag]

A Makefile is provided to make it easier to build the image with a default image name. You can also use the docker build command directly:

bash
$ docker build -t <image name>:<tag> .

!>Note: The image name should correspond with what exists in the docker-compose.yml file. By default, the image name is environment variable ASKCOS_IMAGE_REGISTRY + askcos-site. If you choose to use a custom image name, make sure to modify the ASKCOS_IMAGE_REGISTRY variable or the docker-compose.yml file accordingly. For Kubernetes deployment, the image registry and tag are defined in the values.yaml file.

Similarly, if you also want to build askcos-core:

bash
$ git clone https://gitlab.com/mlpds_mit/askcos/askcos-core
$ cd askcos-core
$ make [TAG=my_tag]

Note that you will need to specify the appropriate askcos-core version when building askcos-site afterwards:

bash
$ cd askcos-core
$ make TAG=my_tag
$ cd ../askcos-site
$ make CORE_VERSION=my_tag TAG=my_tag

ASKCOS Development

Software package for the prediction of feasible synthetic routes towards a desired compound and associated tasks related to synthesis planning. Originally developed under the DARPA Make-It program and now being developed under the MLPDS Consortium.

Released under the MIT License.