There is a new KNIME forum. You can still browse and read content from our old forum but if you want to create new posts or join ongoing discussions, please visit our new KNIME forum: https://forum.knime.com

Substructure Match Counter question

Member for

8 years 6 months James Davidson

Hi,

I should start by saying "great job" with the newest release - thanks!

Now onto a question about the Substructure Match Counter node.  I have been successfully using this to count-up eg Ns and Os with at least one H [#7&!H0, #8&!H0], but wanted to run some more 'Lipinski-like' bond counting.  So what I opted for was to use the Hydrogen Adder node to make sure all of the implicit Hs were connected in the molecule graphs, then used [#7,#8]-[#1] as my query molecule.

However, the Substructure Match Counter only seems to count once, not twice, in the case of NH2 groups...

I am guessing this isn't the expected behaviour (but am also guessing it is down to how hydrogens get treated somewhere down the line!)?

 

Kind regards

James

Comments
Wed, 09/28/2011 - 10:48

Member for

7 years 4 months

mikhail.rybalkin

Hello James,

For our substructure search algorithm there is no difference whether hydrogens are folded or unfolded. Because there is no difference between implicit and explicit hydrogens. This is why we don't support 'h' notation in SMARTS, and support only 'H' notation.

And for substructure search all pure hydrogens from the query molecule were ignored because they don't make sense. But for counting number of matches they make sense and is a bug in Indigo that we forgot about it. This will be fixed. As for now you can trick our substructure search algorithm and mask your hydrogen as a more complex expression. For example, you can specify [#7,#8]-[#1,#112] (if you don't have Copernicium in your molecules). In this case number of matches should be correct.

And you do not need to unfold hydrogens. There should be no difference in number of matches for structure with folded and unfolded hydrogens, because internally it is done automatically if necessary. (It would be good if you can verify this on your dataset).

So, we will fix it soon, and as temporary solution you can use [#7,#8]-[#1,#112] as a query.

Best regards,
Mikhail

Fri, 01/20/2012 - 07:56

Member for

7 years 4 months

mikhail.rybalkin

Hello James,

This bug has been fixed in the nightlty builds version.

I understand that the period for fixing this bug was very long, but the problem might be still actual.

Best regards,
Mikhail