There is a new KNIME forum. You can still browse and read content from our old forum but if you want to create new posts or join ongoing discussions, please visit our new KNIME forum: https://forum.knime.com

Weighted Regression and other Analyses

Member for

2 years 5 months sgchase

I often work with survey data where each record/observation represents a different proportion of the overall population. This is quantified via a weight variable. The weight variable needs to be considered when computing simple statistics like averages where it must be applied to a single variable. In other circumstances, as in predictive modeling, it must be applied to an entire observation. I've searched through the community forums and haven't found a simple way to perform either of these weighted analyses. For variable weighting, I did see a suggestion to use Erlwood's Desirability Ranking node, but I did not follow the explanation for how it should be set up. For observation weighting, I did see a suggestion to use the one-to-many rows node to increase the number of observations proportionally to the weight variable. Unfortunately, that makes some of my data sets enormous, making analysis impractical. Any help you could provide to solve either of these issues would be greatly appreciated.

Comments
Wed, 02/28/2018 - 10:01

Member for

3 years 2 months

agaunt

I think here you need the R Snippet: For weighted linear regression, you can use the following code adapted to your data:

 

knime.out <- data.frame(knime.in, your_prediction = lm([Your Target Variable] ~ [Your Covariate 1] + [Your Covariate 2] + [and so on], weights = [Your Weight Vector])$fitted)

Thu, 03/01/2018 - 11:58

Member for

2 years 5 months

sgchase

That's great! Thanks! I see your code creates data output with the prediction variable as the last column. I've experimented and found I can use the R Learner node to save the results of the model itself, which I then can combine with data in an R Predictor node to obtain the same dataset. That's pretty cool.

Fri, 03/02/2018 - 08:43

Member for

3 years 2 months

agaunt

Your solution is clearly more elegant! I'm glad I could help you find the way. ;-)