Newbie here! I have 2 sdf files (37K and 193K unique entries respectively). I would like to figure out the molecular overlap between the two libraries to come up with a common chemical set that encompasses all of the underlying molecular diversities.
Could someone help/guide?
Here's one way to do it - you will need the Chemistry Addons and RDKit Community nodes installed for this solution.
Take 2 SDF Reader nodes and read each io the SD files you want to compare into one of them.
Hook up an RDKit CANON Smiles node to the outputs of the SDF Reader nodes and run with default settings.
Join the output to either input of the Reference Row Filter node, and configure it to use the Canonical (Molecule) columns from both tables for filtering. This should leave you with a table containing only the molecules present in both SD files.
You may need to put the RDKit Salt Stripper node before each RDKit CANON Smiles node, in case your SD files contain counter ions.
I did as you mentioned, any comments on this?
I think what Evert meant was to connect the first CANON node to the first input of the Reference Row Filter, and the second CANON node to the second input, not using two Reference Row Filters. ;-)