There is a new KNIME forum. You can still browse and read content from our old forum but if you want to create new posts or join ongoing discussions, please visit our new KNIME forum: https://forum.knime.com

Fingerprint Similarity Improvement Request

Member for

8 years 9 months richards99

Hi,

I notice that the Fingerprint Similarity node has the option to use the Tversky Index which is a modification of the Tanimoto Index with alpha and beta parameters.

I am completely unsure what alpha and beta values the node uses for the Tversky Index. Ideally I would like to be able to specify the values myself, this can be really powerful. If this is selected from the dropdown, can two boxes in the node appear to specify alpha and beta.

If you have a molecule A you are searching against with a set of molecules B, then;

Specifying an alpha value of 0, allows you to search for molecules which are substructures of A. i.e. the highly scored molecules will have substructures represented in A without having other substructures present which are NOT present in A. This is useful for finding molecular fragments of A.

Specifying a beta value of 0, allows you to search for molecules which are superstructures of A. i.e. the highly scored molecules will have the most substructures to that of A regardless of the fact that the molecules may contain additional functionalities which are NOT present in A. This is useful for finding molecules which contain all of A, or close to all of A, and more. For example A maybe a fragment with some low level activity and you want to find molecules which contain A plus more chemical features.

 

As an aside, what type of fingerprint is the Indigo fingerprint, is it an Extended Connectivity/Functional Class type fingerprint, i.e. Morgan algorithm/Daylight type where it is defining patterns in connectivity or is it assigning functional groups to bits, and identifying TRUE or FALSE for the presence of many functionalities like MACCS fingerprints.

 

Thanks in advance,

Simon.

Comments
Tue, 03/06/2012 - 07:30

Member for

8 years 9 months

richards99

I has been noticed that a much earlier build if Indigo had these alpha and beta options in the Fingerprint Similarity node to change the values, which were set to 0.5 each by default (which is the Dice Similarity Measure).

Please can these alpha and beta configuration boxes be returned to the Fingerprint Similarity node for the Tversky measure.

Thanks

Simon.

Mon, 03/12/2012 - 11:57

Member for

8 years 9 months

richards99

Many thanks for quickly fixing the Tversky similarity measure.

And many thanks for the link to the detailed post on your Indigo fingerprints. This is most useful.

Simon.

Thu, 04/10/2014 - 08:27

Member for

5 years 7 months

serendip42

Hi,

I wonder in my node I cannot select different similarity scores. The output is tanimoto. The only things I can change are

 

Column with fingerprint

Column with reference fingerprint

Aggregation method

Return type.

Is there something wrong with my Knime version?

Just installed new updates...

Wed, 01/17/2018 - 07:35

Member for

4 years

evert.homan@scilifelab.se

Hi Mikhail,

I am currently looking at Tversky similarity searching using the Indigo Fingerprint Similarity node but find that the results are highly dependent on the fingerprints used. I am searching for superstructures of A, so set alfa=1 and beta=0, and then compared the results when using Indigo fingerprints, RDKIT, or CDK (all default settings).

The funny thing (or maybe expected?) is that all give a similarity of 1.0 so long as there are no hetero atoms in rings, but when I do the comparison with a reference structure containing a pyridine, only the Indigo fingerprints produce a similarity of 1 (as expected):

Query Reference Indigo RDKit CDK
CC1=CC=CC=C1 C1=CC=CC=C1 1 1 1
CC1=NC=CC=C1 C1=NC=CC=C1 1 0.556 0.979

To me this is a peculiar result, had expected the similarities for the 2nd row also to be 1.0 irregardless of the fingerprints used, since the query here is also an exact superstructure of the reference, as in the top row. Is this due to the RDKit and CDK fingerprints being dependent on the assymetric nature of the substituted pyridine ring?

Thanks/Evert