There is a new KNIME forum. You can still browse and read content from our old forum but if you want to create new posts or join ongoing discussions, please visit our new KNIME forum: https://forum.knime.com

Selenium Nodes AdBlock Plugin

Member for

4 years 1 month Edlueze

How can I add plugins like AdBlock and customize the WebDriver used by Selenium Nodes?

My actual problem is this. I am trying to use Selenium Nodes from China so I continually bump up against the Great Chinese Firewall. The problem occurs when using Selenium Nodes to Navigate to a foreign URL. The foreign URL typically uses Google-Analytics and tries to load a bunch of foreign advertisements. But Google is blocked and the advertisements are extremely slow to load. As a result, the Navigate node times-out and my workflow crashes to a halt! Even timeouts of 60 seconds is not sufficient to prevent this problem.

I was excited when I discovered that Opera had a built in Ad Blocker. In addition, adding the AdBlock Plus plugin allows you to add your own filters, so I could filter out all attempts to load anything from Google using the *google* filter. These configurations make the standalone Opera browser lightning fast!

But I cannot replicate this from within KNIME using Selenium Nodes. The Selenium Nodes Opera driver (and Chrome driver) seem to ignore all of the installed plugins. In fact, the Selenium Nodes WebDrivers seems to ignore all user settings other than the defaults.

How can I customize the browser used by Selenium Nodes?

Are any settings saved by the Selenium Nodes browser driver? What about Cookies and Passwords - are they all automatically deleted when the WebDriver is closed?

Specifically regarding Opera, I installed the latest version 41.0.2352.69 but can only get halfway through the WebDriver Factory Test. Is there a more stable version I should be using?

 

Comments
Thu, 12/01/2016 - 02:31

Member for

8 years 3 months

qqilihq

Hi Edlueze,

thank you for your very detailed problem description and motivation. With the current version of the nodes, a browser-specific configuration is unfortunately not yet possible and the browser session will always start with blank default settings.

I've had several similar requests (also from China, btw.) to disable image loading because of the issues you highlighted, so I definitely see the issue and have this wish on my list.

A little explanation: To the current point, the nodes provide no browser-specific configuration options. The "WebDriver Factory" is currently simply the smallest common denominator of all available browsers. The highlighted feature would require an individual configuration interface and logic for each single supported browser (Chrome, Firefox, Opera, etc.). This is definitly doable, but will require a substantial research, implementation and testing effort, so the ETA would probably be something around mid of next year.

A potential workaround until then could be, to perform the blocking system-wide, instead of just in the browser. The usual way is to use a "hosts" file, which blocks unwanted domains. An example can be found here. (note, that this is not a personal recommendation; I'm not using the host file on my own, so I cannot say how reliable it is, or whether it might block too much content).

Specifically regarding Opera, I installed the latest version 41.0.2352.69 but can only get halfway through the WebDriver Factory Test. Is there a more stable version I should be using?

Could you provide me with the DEBUG log which is output when clicking the "Test" button? I assume, that the test fails b/c of the issue you described above (the test uses the default timeout and simply tries to load the Wikipedia main page, which might fail in your case, because loading is too slow or even blocked?)

If the current situation is a blocker for using the Selenium Nodes, please get in touch with me through email. I can try to give you access to a pre-release version, as soon as it's ready.

Philipp

Fri, 12/02/2016 - 02:47

Member for

4 years 1 month

Edlueze

Hi Philipp - great feedback as usual!

Your hosts file suggestion is a great idea. I'm using a dedicated machine so this solution should work just fine.

I typically only need to GET a single file and my best work has been done with the HttpRetriever from the Palladian nodes. But marshalling all of the cookies and tracking the SessionID's is incredibly complicated, and increasingly the Selenium nodes are the best solution. Being able to disable the images and everything else that is unnecessary to load would be a great improvement.

Regarding my Opera version, the Opera browser starts up and is fully functional (I can use it to browse) but it never even tries to test the Wikipedia page - it just sits forever with the message "Starting org.openqa.selenium.opera.OperaDriver". The DEBUG log is almost empty but I also copied down the text from the pop-up:

DEBUG AbstractWebDriverFactoryWithBinary            Using included binary at path: C:\Program Files\KNIME_3.2.0\plugins\ws.palladian.nodes.selenium.driver.win64_1.0.0.201611051007\binaries\operadriver.exe

Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Build info: version: 'unknown', revisions: 'unknown', time: 'unknown'
System infor: host: 'France', ip: ''192.168.0.101', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_60'
Driver info: driver.version: OperaDriver

 

Fri, 12/02/2016 - 04:17

Member for

4 years 1 month

Edlueze

I forgot to ask a clarifying question. You said "the browser session will always start with blank default settings". It sounds like that includes all of the cookies and passwords. But where are those things all stored? In Selenium's own temporary folder? Or in my Users folder? Can you confirm that these are always deleted at the end of each session?

 

Sat, 12/03/2016 - 08:28

Member for

8 years 3 months

qqilihq

Hi Edlueze,

thank you again for reporting the Opera issue. I have just tried with a most recent Opera version and can confirm the problem. It is caused by the fact, that recent Opera versions do not work with the necessary OperaChromiumDriver which takes care of the communication b/w the browser and Selenium. Obviously, the Opera guys do not maintain that driver any longer, so there's no easy fix.

There are some potential workarounds, which I will investigate as soon as I have some spare time. In the meantime I suggest using a different browser, such as Chrome, instead. (in case one needs to run Opera, versions until 32 should work just fine). I will update the node FAQs and documentation to reflect this information.

Concerning your question about the profile data:

I forgot to ask a clarifying question. You said "the browser session will always start with blank default settings". It sounds like that includes all of the cookies and passwords. But where are those things all stored? In Selenium's own temporary folder? Or in my Users folder? Can you confirm that these are always deleted at the end of each session?

Yes, currently the browsers always start with an empty profile located at some temporary location (later versions of the nodes will likely allow you, to re-use an existing profile optionally). You can verify this location manually: For Chrome e.g., enter "chrome://version/" in the address field and you will receive some technical information. The last line will list your profile path. In my case (running on the Mac), it is something like:

"/private/var/folders/nz/b5yrrbd51hzcwxng_nf1m9n00000gn/T/.org.chromium.Chromium.WHXNgG/Default"

When starting a further browser instance, the path changes for each instance. The temporary profile directory is automatically removed when quitting the browser instance through KNIME or quitting KNIME as a whole.

Sun, 12/04/2016 - 01:20

Member for

8 years 3 months

qqilihq

Good news: Opera functionality for current versions is restored. There'll be an update of the Selenium Nodes during next week.