KNIME Cheminformatics Workshop September 2014

The 2nd KNIME Cheminformatics Workshop was hosted by Vernalis in Cambridge on September 29th 2014. Thanks to Steve R and James L for taking notes.

Topics

Items from last workshop & What's cooking in KNIME

Workflow versioning

Dave Morley brought this up, and most other users agreed, that a "real" versioning mechanism (GitHub, Mercurial, SVN etc) was needed

  • Various users reported having investigated using SVN or Mercurial in KNIME
  • Thorsten explained that the KNIME explorer view does not show the Eclipse Project explorer "add-ons" – e.g. commit status
  • Need an Import from … repository option under import workflows

Internal Webservice access

  • Lilly and Vernalis have independently implemented generic frameworks to access their internal webservices
  • Different companies have different architectures, so probably these will remain bespoke
  • James L / Steve R may be able to liaise around shared issues

Matched Pairs

  • Identified as important in April meeting
  • Both Vernalis and Lilly working on new MMP nodes, with a view to release. Complementarity between functionality of the nodes (Vernalis – multi-cut, no data columns, split fragmentation and MMP generation)
  • Lilly nodes currently in legal approval
  • Vernalis in advanced development, and nearly release-ready, but again require approval
  • Steve R / James L should liaise to ensure overlap is avoided

Vernalis Nodes

  • See Steve's presentation
  • See Matched Pairs
  • Timed loops (run for / to)
  • Pause nodes (wait for / to)
  • Interest in Vernalis Timed loop / delay nodes – We will seek permission to release asap
  • Ertl Scaffold keys (in RDKit)
  • Co-ordinate manipulations in RDKit (Rotate about axes, align to principal axes etc)
  • Read/Write Variables nodes

Lhasa Nodes

Sam W described some of the internal Lhassa Limited nodes, and interest in a number of them was expressed.

  • Multi-column row filter node
  • BitSet manipulation (see below)
  • Tools
  • Viewers
  • Weka
  • Many calculators
  • and others...

Molecular Visualisers

  • Lot of interest in this, either for small or macro-molecules
  • Richard S been using GLMol
  • Richard S / James L / Dave M (& others?) to share experiences with various viewers
  • Maybe easier with the new JS-based views and quickforms
  • James L showed the multi-molecule sketcher from Lilly. Everyone else wants it too!

Bitvector OR / AND /XOR/NOT

  • Nodes needed – SDR to investigate which nodes where in development at Vernalis, as we already have 1 fingerprint node released – these are relatively easy pickings to add to our contribution!
  • Lhassa have these nodes for BitSets, but BitVector column implementations would be useful
  • Are there BitVector aggregators in the GroupBy node? SDR Will look into adding aggregators for them too.

HELM & Biosequences

  • Vernalis looking at DNA / RNA / Protein Sequences based on corresponding BioJava sequence objects – still a work-in-progress
  • James L suggested possible Lilly contacts
  • Ultimately, depends on use-case
  • Some discussion around a HELM or xHELM datatype
    • Would need careful definition of what is required for the types
    • Utility?
    • Viewer?
  • Possible topic for KNIME partner meeting?

Node usage statistics

Lilly have implemented extensive logging of node configure /execution events – might be able to share stats, or implementation details?

Wishlist

  • Easier node creation for developers
    • a graphical interface for drag-and-drop dialog creation, which also handled settings models in NodeDialog and NodeModel classes
    • Alternatively, an Add SettingsModel... menu option, which deals with the creation / load/save/validate methods automatically to reduce typing?
    • Wizard for column rearranger or full-blown execute methods
    • Enhancement to current wizard, which asks for number of ports, number of views, port types, and provides appropriately ‘adjusted’ code in the node model / node factory classes
  • Workflow preferences
    • E.g. default settings for automatic chemistry type conversions
  • Fulltext search in node repository
  • Integration of GLMol in WebPortal
  • More than one report per workflow
  • Reports distributable in meta/subnodes
  • Easier molecule input quickform (currently needs at least two nodes)
  • "Linked" workflows in local workspace that can be synchronized with server
  • Select all nodes between two selected nodes in workflow editor