(sec:DOCKER.detailed)= # More details on the ORCA DOCKER :::{note} The content here will assume you already have some basic knowledge from {ref}`sec:DOCKER.typical`, if not, please check that section first! ::: ## Underlying theory The basic idea behind the DOCKER is to use a type of swarm intelligence {cite}`shami_particle_2022` to find local minima for where the `HOST` would best be positioned, based on the energies and gradients of a given Potential Energy Surface (PES). Swarm intelligence algorithms were invented trying to mimic the behavior of animals such as bees and ants, and it actually fits the problem here quite well (as shown below). The basic steps needed for the algorithm are: 1. Optimize `HOST` and `GUEST`. 2. Create a spacial grid around the `HOST`. 3. Initialize a possible set of random solutions. 4. Let the swarm intelligence find local minima. 5. Collect a fraction of the best solutions found on 4. and fully optimize them. 6. The structure with the lowest energy is considered the best solution. To quickly demonstrate how the initial set of solutions \"evolve\" to find different local minima, take as an example the substituted biphenyl as `HOST` below. The figure below demonstrates how a water molecule (`GUEST`) is docked onto the initial `HOST`. Each gray ball represents one possible placing of the water molecule (ignoring here its rotation angle). At the start, they are all distributed over the `HOST` without any bias. As the iterations go by, the algorithm starts to detect that both sides of the molecule have lower energy solutions than the aromatic rings, as expected, and the possible solutions start to converge to those points. In the end, solutions containing water molecules binding to both sides are taken, the system is then fully optimized and one finds that the best biding occurs with the hydroxy group to the right. (fig:docker_PSO)= :::{figure} ../../images/docker_PSO.* The docking process of a water molecule on the substituted biphenyl. ::: :::{important} Right now the DOCKER can only be used together with the fast GFN-XTB methods, it will be generalized later. It is also not fully exploring the parallelization potential yet, and can be much faster for the next versions. ::: ## Looking Deeper into the Output The details related to the grid and the swarm optimization can be found on the output, here is one example taken from the water dimer example from the earlier section: ```orca Creating spatial grid Grid Max Dimension 5.50 Angs Angular Grid Step 32.73 degrees Cartesian Grid Step 0.50 Angs Points per Dimension 11 points Initializing workers Population Density 0.50 worker/Ang^2 Population Size 57 Evolving structures Minimization Algorithm mutant particle swarm Min, Max Iterations (3 , 10) ``` The `Population Density` here defines how many solutions will be placed on the initial grid around the `HOST` and can be controlled by `%DOCKER POPDENSITY 0.50 END`, while the population size is a direct consequence of that, unless specified by `POPSIZE` under `%DOCKER`. Another important piece of information is the `Min, Max Iterations`, which define the minimum number of iterations before checking for convergence, and the maximum possible number of iterations. These can be changed by setting `MINITER` and `MAXITER` under the `%DOCKER`. Then during the swarm search itself (here called the \"evolution step\"), the `HOST` and `GUEST` are partially optimized using the same criteria as from `!SLOPPYOPT`, and their energy at convergence is taken as a convergence criteria. The printing, still for the previous example, reads: :::{note} All of the optimizations respect constraints and/or wall potentials given on the `%GEOM` block. The `HOST` can be easily frozen with `%DOCKER FIXHOST TRUE END`. ::: ``` Iter Emin avDE stdDE Time (Eh) (kcal/mol) (kcal/mol) (min) ------------------------------------------------------- 1 -10.147462 2.756033 1.821981 0.03 2 -10.147462 2.121389 1.610208 0.03 3 -10.148583 2.313606 1.365227 0.03 4 -10.148583 1.846998 1.188680 0.02 5 -10.148583 1.587332 1.168207 0.02 No new minimum found after 3 (MinIter) steps. ``` Here we have the number of iterations, the energy of the best solution found during the process, the average energy difference from the lowest solutions and its standard deviation. The first number shows if the system is evolving towards a lower energy solutions and the later two give an idea of the \"spread\" of the energies found. If no new minimum is found over `MINITER` iterations, the algorithm is considered to be converged and the best solutions are taken to the final optimization step. ## The final steps After the evolution step, a total of $max(sqrt(PopSize),5)$ structures are taken from the set of solutions for a final full optimization. The output looks like: ```orca Running final optimization Maximum number of structures 7 Minimum energy difference 0.10 kcal/mol Maximum RMSD 0.25 Angs Optimization strategy regular Coordinate system redundant 2022 Fixed host false ``` To avoid redundant solutions being optimized here, a check is made such that only structures with an energy difference of at least a `Minimum energy difference` and `Maximum RMSD` (obtained after an optimal rotation using the quaternion approach) are taken. The coordinate system might also be automatically set to Cartesian if the number of atoms is above 300 to speedup. If some optimizations fail during the process, they will be flagged as `optimization failed`, as in the example below: ```orca Struc Eopt Interaction Energy Time (Eh) (kcal/mol) (min) ------------------------------------------------ 1 -10.149006 -4.968378 0.01 2 -10.149005 -4.967965 0.01 3 optimization failed 0.01 4 -10.149007 -4.968641 0.01 (...) ``` That only means some of the solutions failed to fully optimize at the end, and on the printed file their energy is set to 1000 Eh to show the job was not completed. If you want to increase the number of iterations, or change the final convergence criteria, just change them using the `%GEOM` block as usual. ## General Tips 1. In order to quick change the PES of the docking process, you can use `!DOCK(GFN2-XTB)`, `!DOCK(GFN1-XTB)`, `!DOCK(GFN0-XTB)` or `!DOCK(GFN-FF)`. 2. For a quicker and less accurate docking the input `!QUICKDOCK` is also available. 3. There are four levels of \"docking\" procedure, from simple to more elaborate: `SCREENING`, `QUICK`, `NORMAL` (default) and `COMPLETE`, which can be given as `%DOCK DOCKLEVEL COMPLETE END`. 4. To increase the final number of structures optimized in the end, just change `NOPT` under `%DOCKER`. 5. If the guest topology is somehow changed during the docking process, that structure will be discarded. This can be turned off by setting `%DOCK CHECKGUESTTOPO FALSE END`. :::{important} By the time of the ORCA6 release, this algorithm was still not published. Publications are under preparation. ::: ## Complete Keyword List ```orca !QUICKDOCK # simple keyord to set DOCKLEVEL QUICK !NORMALDOCK # simple keyord to set DOCKLEVEL NORMAL !COMPLETEDOCK # simple keyord to set DOCKLEVEL COMPLETE !DOCK(GFN-FF) # simple keyord to set EVPES GFNFF !DOCK(GFN0-XTB) # simple keyord to set EVPES GFN0XTB !DOCK(GFN1-XTB) # simple keyord to set EVPES GFN1XTB !DOCK(GFN2-XTB) # simple keyord to set EVPES GFN2XTB %DOCKER # # general options # GUEST "filename.xyz" # an .xyz file (can be multistructure), from where # the guest(s) will be read. can contain different # charges and multiplicities for each guests on the # comment line. will only be read if exactly two # integer numbers are given, otherwise ignored. DOCKLEVEL SCREENING # defines a general strategy for docking. QUICK # will alter things like that population density NORMAL # and final number of optimized structrures. COMPLETE # default is NORMAL. PRINTLEVEL LOW NORMAL # default HIGH # will print many extra files! NREPEATGUEST 1 # number of times to repeat the content of the "GUEST" file CUMULATIVE TRUE # add the contents of the "GUEST" file one on # top of each other? # default is FALSE, meaning each will be done independently. GUESTCHARGE 0 # can be used to defined a charge for the guest (default 0) GUESTMULT 3 # same for multiplicty (default 1) # both can also be given via the comment line of the # "filename.xyz", see above. RAMDOMSEED TRUE # whether to allow for the process to be trully random and # will give different results everytime (default FALSE) FIXHOST TRUE # freeze coordinatef for the HOST during all steps? # (default FALSE) CHECKGUESTTOPO FALSE # make sure that the topolgy of the GUEST is always kept # (default TRUE) # # evolution step # MAXITER 10 # maximum number of iterations MINITER 3 # minimum number to check for convergence # and minimum required of equal steps # to signal convergence POPDENSITY 0.5 # the population density based on the HOST grid POPSIZE 150 # a fixed number for the swarm population size EVOPTLEVEL NORMALOPT # use default optimization criteria during the evolution step # instead of the !SLOPPYOPT? more accurate, but slower. EVPES GFNFF # which PES to use **only** during the evolution step. GFN0XTB # can be different from the final optimization. GFN1XTB GFN2XTB # # final optimization # NOPT 10 # a fixed number of structures to be optimized NOOPT FALSE # do not optimize any structure at all? (default FALSE)