v0.4.0 Release Notes
Video demo of new UI features: WIP
Release Notes
User notes:
- New site selectivity module to predict the most likely site of aromatic C-H functionalization.
- New clustering of one-step retrsynthetic predictions in the Interactive Path Planning user interface.
- New visualization of tree builder results in the Interactive Path Planning user interface. The entire expanded graph is now able to be explored, instead of only pathways that were resolved.
- The tree builder is now more resilient for large, complex molecules and long expansion times. "Connection reset by peer" should be observed much less frequently.
- Updated webpage to view molecules in the buyables database. This webpage also now includes the ability to modify, in real time, the buyables database.
- Improved API error handling.
- Molecular weights of compounds are now shown in the forward predictor results.
Developer notes:
- Support for Kubernetes deployments.
- Option included to not preload retrosynthetic templates upon deployment, and instead create the templates on-the-fly as needed.
- Groundwork for unit tests.
- Automatically generated documentation.
- Enabled option to skip https altogether, despite this not being recommended.
Bug fixes:
- "Connection reset by peer" should be observed much less frequently.
- Format of the synthesis predictor export is now csv.
- Link to export Reaxys reactions supporting a given template now works again.
- No more Docker images are built on-the-fly.
- Results saved with old versions of ASKCOS should now be able to be restored to newer versions.
Upgrade information
The easiest way to upgrade to the new version of ASKCOS is using Docker and docker-compose. To get started, make sure both docker and docker-compose are installed on your machine. We have a pre-built docker image of ASKCOS hosted on GitLab. It is a private repository; if you do not have access to pull the image, please contact us. In addition, you need to have the deploy/ folder from our code repository. To get the most recent version of ASKCOS:
$ docker login registry.gitlab.com # enter credentials
$ docker pull registry.gitlab.com/mlpds_mit/askcos/askcos:0.4.0
Then, follow the instructions under "How do I upgrade ASKCOS to a new version?" below using the new version of the deploy folder from this repository.
Using GitLab Deploy Tokens
We are in the process of migrating our development space from GitHub to GitLab. There are a number of beneficial features provided through GitLab that encouraged this move. The first of which are deploy tokens, providing read-only access to source code and our container registry available through GitLab. For the next few releases, code and containers will continue to be made available via the GitHub and DockerHub repositories you may be familiar with, but eventually we plan to move exclusively to GitLab. Below is a complete example showing how to deploy the askcos application using deploy tokens (omitted in this example). The only software prerequisites are git, docker, and docker-compose.
$ export DEPLOY_TOKEN_USERNAME=
$ export DEPLOY_TOKEN_PASSWORD=
$ git clone https://$DEPLOY_TOKEN_USERNAME:$DEPLOY_TOKEN_PASSWORD@gitlab.com/mlpds_mit/askcos/askcos.git
$ docker login registry.gitlab.com -u $DEPLOY_TOKEN_USERNAME -p $DEPLOY_TOKEN_PASSWORD
$ docker pull registry.gitlab.com/mlpds_mit/askcos/askcos:0.4.0
$ cd askcos/deploy
$ git checkout v0.4.0
$ bash deploy.sh
NOTE: The git clone command pulls enough to deploy the application (but not all the data files hosted using git large file store). To acquire the complete source code repository including large data files (if you want to rebuild a custom image, for example), please install git lfs and pull the rest of the repository.
First Time Deployment with Docker
Prerequisites
- If you're buidling the image from scratch, make sure git (and git lfs) is installed on your machine
- Install Docker OS specific instructions
- Install docker-compose installation instructions
Deploying the web application
The entrypoint for deployment is a bash script that runs a few docker-compose commands in a specific order. A few of the database services need to be started first, and more importantly seeded with data, before other services (which rely on the availability of data in the database) can start. The bash script can be found and should be run from the deploy folder as follows:
$ bash deploy.sh
There are three optional arguments you can pass along:
- --skip-seed: This will skip seeding the mongo database. Unless you know that the mongo database is currently up and running, you should probably choose to seed the database
- --skip-ssl: This will skip the generation of a random self-signed ssl certificate. If you are supplying your own, use this option so as to not override the certificates
- --skip-https: This will skip requiring https altogether. This option is not recommended, but was added in case users' browsers do not allow viewing pages with invalid certificates.
- --skip-migration: This will skip performing the db migration required by django. Only use this if you know the migration has already been performed and the db models have not changed.
To stop a currently running application, run the following from the deploy folder, where you ran deploy.sh:
$ docker-compose stop
If you would like to clean up and remove everything from a previous deployment (NOTE: you will lose user data), run the following from the deploy folder:
$ docker-compose down -v
Upgrading or moving deployments
Backing up user data
If you are upgrading the deployment from a previous version,or moving the application to a different server, you may want to retain user accounts and user-saved data/results. Previous to version 0.3.1, user data was stored in an sqlite db at askcos/db.sqlite3
and a user_saves directory at makeit/data/user_saves
, in the running app container service. Versions >=0.3.1 use a mysql service for user data, and a mongo db for user results. Although the process for backing-up/restoring data is different, currently the backup.sh
and restore.sh
scripts are capable of handling the backup process. Please read the following carefully so as to not lose any user data:
- Start by making sure the previous version you would like to backup is currently up and running with
docker-compose ps
. - Checkout the newest version of the source code (only the deploy folder is necessary)
- Run
$ bash backup.sh
- Make sure that the
deploy/backup
folder is present, and there is a folder with a long string of numbers (year+month+date+time) that corresponds to the time you just ran the backup command - If the backup was successful (
db.json
anduser_saves
(<v0.3.1) orresults.mongo
(>=0.3.1) should be present), you can safely tear down the old application withdocker-compose down -v
- Deploy the new application with
bash deploy.sh
- Restore user data with
bash restore.sh
Note: For versions >=0.3.1, user data persists in docker volumes, and is not tied to the lifecycle of the container services. In other words, as long as you do not include the [-v] flag to docker-compose down
, volumes do not get removed, and user data is safe. In this case, the backup/restore procedure is not necessary as the new containers that get created upon an upgrade will continue to use the docker volumes that contain all the important data.
(Optional) Building the ASKCOS Image
The askcos image itself can be built using the Dockerfile in this repository.
$ git clone https://gitlab.com/mlpds_mit/askcos/askcos
$ cd askcos/
$ git lfs pull
$ docker build -t askcos .
Add customization
There are a few parts of the application that you can customize:
- Header sub-title next to ASKCOS (to designate this as a local deployment at your organization)
- Contact emails for centralized IT support
These are handled as environment variables that can change upon deployment (and are therefore not tied into the image directly). They can be found in deploy/customization
. Please let us know what other degrees of customization you would like.
Managing Django
If you'd like to manage the Django app (i.e. - run python manage.py ...), for example, to create an admin superuser, you can run commands in the running app service (do this after docker-compose up
) as follows:
docker-compose exec app bash -c "python /usr/local/ASKCOS/askcos/manage.py createsuperuser"
In this case you'll be presented an interactive prompt to create a superuser with your desired credentials.
Important Notes
First startup
The celery worker will take a few minutes to start up (possibly up to 5 minutes; it reads a lot of data into memory from disk). The web app itself will be ready before this, however upon the first get request (only the first for each process) a few files will be read from disk, so expect a 1-2 second delay.
Scaling workers
Only 1 worker per queue is deployed by default with limited concurrency. This is not ideal for many-user demand. You can easily scale the number of celery workers you'd like to use with docker-compose up -d --scale tb_c_worker=N
where N is the number of workers you want, for example. The above note applies to each worker you start, however, and each worker will consume RAM.
How to run individual modules
Many of the individual modules -- at least the ones that are the most interesting -- can be run "standalone". Examples of how to use them are often found in the if __name__ == '__main__'
statement at the bottom of the script definitions. For example...
Using the learned synthetic complexity metric (SCScore)
makeit/prioritization/precursors/scscore.py
Obtaining a single-step retrosynthetic suggestion with consideration of chirality
makeit/retrosynthetic/transformer.py
Finding recommended reaction conditions based on a trained neural network model
makeit/synthetic/context/neuralnetwork.py
Using the template-free forward predictor
makeit/synthetic/evaluation/template_free.py
Using the coarse "fast filter" (binary classifier) for evaluating reaction plausibility
makeit/synthetic/evaluation/fast_filter.py
Integrated CASP tool
For the integrated synthesis planning tool at makeit/application/run.py
, there are several options available. The currently enabled options for the command-line tool can be found at makeit/utilities/io/arg_parser.py
. There are some options that are only available for the website and some that are only available for the command-line version. As an example of the former, the consideration of popular but non-buyable chemicals as suitable "leaf nodes" in the search. An example of how to use this module is:
python ASKCOS/Make-It/makeit/application/run.py --TARGET atropine
Model choices.
The following options influence which models are used to carry out the different tasks within the algorithm.
Context recommendation: via '--context_recommender', currently has the following options:
-'Nearest_Neighbor': Uses a nearest neighbor based database search (memory intensive, ~30GB, and slow; relies on external data file)
-'Neural_Network': Uses a pretrained neural network (highly recommended!!)
Context prioritization: via '--context_prioritization', specifies how we should determine the "best" context for a proposed reaction. It currently has the following options:
-'Probability': uses the likelihood of success for the reaction under that condition
-'Rank': uses the rank of the reaction under that condition relative to all other outcomes
Forward evaluation: via '--forward_scoring', is used to evaluate the likelihood of success of a reaction. It currently has the following options:
-'Template_Based': uses the original forward evaluation method enumerating all possible outcomes by applying templates and then predicting the most likely main product [https://pubs.acs.org/doi/abs/10.1021/acscentsci.7b00064] (NOTE: the template-based forward predictor requires a custom built version of RDKit from https://github.com/connorcoley/rdkit - we highly recommend using the template-free approach)
-'Template_Free': uses the higher-performing and faster template-free method based on graph convolutional neural networks [https://arxiv.org/abs/1709.04555]
-'Fast_Filter': uses a binary classifier to distinguish good and bad reaction suggestions. It is imperfect, but very fast. Based on the "in-scope filter" suggested by Marwin Segler [https://www.nature.com/articles/nature25978]
Retrosynthetic template prioritization: via '--template_prioritization', is used to minimize the number of reaction templates that must be applied to the target compound at each iteration. It currently has the following options:
-'Relevance': Quantifies how relevant a given template is for the considered reactants, based on the approach suggested by Marwin Segler [https://onlinelibrary.wiley.com/doi/abs/10.1002/chem.201605499]
-'Popularity': Ranking based on number of references in literature, independent of the product species
Precursor prioritization: via '--precusor_prioritization', is used to determine which precursor is the most promising branch to pursue. It currently has the following options:
-'Heuristic': Simple heuristic, with decreasing score as number of atoms, rings and chiral centers increases
-'SCScore': Synthetic Complexity Score - learned quantity indicating how complex a molecule is. Tries to interpret molecules with a protection/deprotection group as less complex than their non-protected counterparts. [https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00622]
Tree scoring: via '--tree_scoring', determines how final synthesis trees should be sorted/ranked. It currently has the following options:
-'Product': uses the product of template score and forward prediction score
-'Forward_only': uses only the forward prediction score
-'Template_only': uses only the template score
Limits and thresholds: the following options will set limits for the different parts of the program
Expansion time via '--expansion_time': limit the amount of time the program spends expanding the retro synthetic tree. Default value is 60 seconds.
Maximum search depth via '--max_depth': limit the search depth in the retro synthetic expansion. Default value is 4.
Maximum degree of branching via '--max_branching': limit the number of branches generated in each layer of the retro synthetic tree. Default value is 20
Maximum number of buyable trees via '--max_trees': limit the number of buyable trees the program should search for. Default value is 500.
Maximum number of templates to be applied via '--template_count': limit the number of templates that are appied for each expansion when using the popularity prioritizer. Default value is 10000.
Minimal number of templates to be considered in retro synthetic direction for non-chiral reactions via '--mincount_retro'. Default value is 25.
Minimal number of templates to be considered in retro synthetic direction for chiral reactions via '--mincoun_retro_c'. Default value is 10.
Minimal number of templates to be considered in synthetic direction via '--synth_mincount'. Default value is 25.
Minimal target rank for considering a target feasible via '--rank_threshold'. Default value is 10.
Minimal probability for considering a target feasible via '--prob_threshold'. Default value is 0.01.
Maximum number of contexts to be proposed for each reaction via '--max_contexts'. Default value is 10
Maximum price per gram for a component to be considered buyable via '--max_ppg'. Default value is 100
Precursor filtering: via '--apply_fast_filter' and '--filter_threshold', is used to impose rapid filtering of low-quality retrosynthetic suggestions. Default is True and 0.75
ASKCOS Development
Software package for the prediction of feasible synthetic routes towards a desired compound and associated tasks related to synthesis planning. Originally developed under the DARPA Make-It program and now being developed under the MLPDS Consortium.