I have some SMILES and would like to count several substructures using the RDKit Substructure Counter Node.
Usually I have just compared the SMILES to SMARTS that I already had but now I would like draw my own SMARTS.
However, I stumbled upon some issues. For example, drawing Benzene in MarvinSketch and converting it to SMARTS format via MolConverter appears to be not working. I have several SMILES that contain Benzene but I just don't get any matches using the RDKit Substructure Counter. By converting the drawn structures to SMILES format instead I do get matches but - in another example - the drawn structure for Aniline (-NH2) will also match Nitrobenzene (-NO2) as if the Hydrogens were simply ignored.
Do I have to change any settings? Or is my approach completely wrong perhaps? Any alternatives in this case?
Help is much appreciated!
You can generate SMARTS (or Smiles) directly from MarvinSketch by changing the output format (see image). So if nothing else you should be able to skip the MolConvert step. Maybe this improves the searching behavior?
Smiles usually do not contain explicit hydrogens so if you do a substructure search on aniline defined as Smiles
you will get nitrobenzene as hit. If you want to exclude this you nee to define a more specific query, by specifically attaching 2 hydrogens to the nitrogen:
thank you very much!
Just one more question: using MarvinSketch I do not actually get the Smiles code but simply the drawn molecule (converted to SMI). Is there any way to change the structure to a Smiles code? Or is there a way to make those hydrogens explicit in MarvinSketch?
In the results table you can right-click on the Smiles column header and then choose 'String' instead of Smiles as the 'Available Renderer' But this you need to do each time you reopen the results table.
If you always want to see the actual Smiles strings you need to change the global rendering preferences for Marvin Smiles output (see image).
Thanks a lot - this was really helpful!!