I am meditating on doing a paper on GLM methods. Example script 1. Also, see my GLM results for WG trees before and after model selection.

In GLMs there are

1. "A is associated with B" means the opposite of "A and B are independent/orthogonal". This is called an “association” here, and “collinearity” here.

2. A:B being significant wrt response variable Z if the dependence of Z on A is modified by a change in the value of B and/or the dependence of Z on B is modified by a change in the value of A.

Associations are simply tested for by performing pairwise correlation tests on your predictor variables. The tolerance (how “correlated” is “correlated”) is generally accepted to be quite loose: A and B can be fairly correlated (e.g. r”=0.7) but they can still be considered ‘independent enough’ for a GLM analysis: essentially, what is being tested for here is whether or not you have very highly correlated pairs of variables, e.g. nationality data and citizenship data in the social sciences (which for most people are the same).

For me, interaction terms are more or less a calculational artifice in GLM calculations. They arise as a result of using categorical variables in GLMs. See e.g. the R script I put together above. This second kind of interaction term is necessary in the GLM fit equations because without them you can’t have differing functional forms between subsets of your data, but that is all. One way around the need for this kind of interaction term is to rerun your GLM fits 'by hand' for all your subsets.

In the ecological literature on GLMs there is surprisingly little about unbalanced designs (in the world of binary logistic regression, we find the concept of “sparse tables” to be analogous and we are aware of evidence that shows convergence of GLM results is very tricky in these situations (see Alkhalaf & Zumbo 2017). Can we perhaps write a paper to show that this can be a problem in wider GLMs as well?

**associations/collinearities**and**interactions**and they are not the same:1. "A is associated with B" means the opposite of "A and B are independent/orthogonal". This is called an “association” here, and “collinearity” here.

2. A:B being significant wrt response variable Z if the dependence of Z on A is modified by a change in the value of B and/or the dependence of Z on B is modified by a change in the value of A.

Associations are simply tested for by performing pairwise correlation tests on your predictor variables. The tolerance (how “correlated” is “correlated”) is generally accepted to be quite loose: A and B can be fairly correlated (e.g. r”=0.7) but they can still be considered ‘independent enough’ for a GLM analysis: essentially, what is being tested for here is whether or not you have very highly correlated pairs of variables, e.g. nationality data and citizenship data in the social sciences (which for most people are the same).

For me, interaction terms are more or less a calculational artifice in GLM calculations. They arise as a result of using categorical variables in GLMs. See e.g. the R script I put together above. This second kind of interaction term is necessary in the GLM fit equations because without them you can’t have differing functional forms between subsets of your data, but that is all. One way around the need for this kind of interaction term is to rerun your GLM fits 'by hand' for all your subsets.

In the ecological literature on GLMs there is surprisingly little about unbalanced designs (in the world of binary logistic regression, we find the concept of “sparse tables” to be analogous and we are aware of evidence that shows convergence of GLM results is very tricky in these situations (see Alkhalaf & Zumbo 2017). Can we perhaps write a paper to show that this can be a problem in wider GLMs as well?