Dear RDKit Nightly Build Users,
I would like to inform you about a major improvement available for testing now in the RDKit Nightly Build.
In the past a SMILES or SDF molecule column had to be converted into the RDKit Mol format first using the “Molecule to RDKit” node before using most of the RDKit nodes. Old workflows will still work the same way as before, but this explicit conversion is no longer necessary when using the updated RDKit Plugin 2.2.0. Simply use all RDKit nodes directly on SMILES and SDF columns and the necessary conversion will be applied automatically.
If you find the time please test in the coming month and let me know if you encounter any issues. Your feedback is - as always - highly appreciated. :-) Thank you!
Well,this is a welcome improvement and I am sure will help a lot of novice users who can never understand the need for so much structure format conversions.
i have tried my best to break the nodes or find a bug, but so far no luck ;-)
One question I do have, is whether it would be better to have the output from a node in the same format as the input, I.e. SDF in, and SDF out. This for sure would be even more user friendly for users, but I can see drawbacks too, presumably extra code per node, and extra CPU overhead. I don't know which way is best, but just thought I would through the question out there.
Not needing an explicit conversion is only one of the new features. The other is, that the output table contains a kind of combined column which contains several representations of a molecule in one cell. Initially there is only SDF (e.g.). Once the first node has processed the table the same column now also contains the RDKit representation in all cells (in addition to the original SDF). If you now (well, not yet) use CDK then it will add a third representation. This ensures that the conversion needs to be performed only once.
If the node alters the molecule, the other representation are (should be) thrown away, though.
This is really great and makes life so much easier. I was playing with the substructure counter node and realized that the bottom input doesn't seem to automatically translate the molecule to RDKit format. The top input received RDKit format as I had already calculated some properties using RDKit, but initially I input SDF directly into the substructure counter node for the query molecules and the node was unhappy. If you have SDF on both inputs it works fine though. So maybe some internal check is needed to check for similar input formats and if necessary translate one of the inputs.
thanks for reporting this bug. I was able to reproduce it due to your detailed description - thanks! I will have a detailed look and provide you an update soon.
I fixed the bug you encountered and generated a new build. This bug would have occur on all RDKit nodes that have a second or third input table with molecule columns that require conversion to RDKit Mol columns.
Please update KNIME with the latest nightly build (actually a "noon" build with the version RDKit KNIME integration 18.104.22.168307161001), and have a look if it works for you. Thanks again for testing it.
Thanks for implementing this! It was a missing feature I think, not a bug ;)
I tested it both ways, RDKit input at top, SDF at bottom, and vice versa, and both work now.
Following that, the CDK-KNIME plug-in now supports this feature in the nightly build. RDKit representations are automatically converted to CDK representations and vice versa.