I would like to compare molecule files A and B and find all molecules in file B that have a Tversky similarity >0.8 to any molecule in file A, using structural fingerprints for comparison.
Any tips for an efficient workflow are highly appreciated!
I think that the following workflow should do what you need, with the following caveats.
a) The row splitter simulates File A and File B (I didn't have two SDF files with similar enough compounds).
b) Similarity calculated here is tanimoto. If you really need Tversky I think you'll need to use the Java Distance node to define the Tversky distance, and pass the output port into the Similarity Search node.
The Indigo 2 fingerprint similarity node will directly calculate a Tversky similarity for you.
I managed to combine the 2 proposed solutions, and made a workflow that generates a Tversky similarity column for each molecule of the reference molecules, by looping over them one at a time. This is still suprisingly fast.
Hopefully this is useful for other user.