The methodology of our Scientific and Field Testing for Ranking Factors in Google
There is a lot of talk about whether SEO concepts are hearsay (or should that be herecy), or proven by some form of scientific testing.
More worryingly though, it seems to be an emerging trend for some SEO
Influencers to claim that their ‘findings’ are as a result of their
testing, yet are unable to validate that testing, if or when asked.
This is why this document is being published. It is to clearly outline
the methodologies behind our own internal testing, whether scientific or
‘in-the-field’ testing.
By publishing our strategies and thinking, it is then easier for any
person to understand our approach, and then decide on how much validity
to assign to our SEO statements.
An Up-front Disclaimer
There are two common categorisations of testing environments. These are
scientific testing, such as Single Variable Testing (SVT), and/or
In-the-field testing.
The scientific testing is performed as much as possible under controlled measure, as much as that is ever possible.
For example, a Single Variable Test (SVT) will only ever test the impact
of one single change to a web asset, such as whether having a keyword
in the H1 tag gives a ranking boost, or whether the amount of words on a
page matter for ranking, etc.
In these tests, it doesn’t so much matter what the factor is, it is more
important that there are no other influences on that pages ranking, and
that the same results can be proven over several tests.
The problem here is that the tests are somewhat limited, as they can
only really be validated on very clear ranking signals. for most
situations, ranking is a much more complex series of actions or
reactions that lead to a combined outcome.
So, when it is not possible to isolate a single factor, or a combination
of factors that can be scientifically controlled, it then falls to the
‘in-the-field’ testing.
In-the-field testing is not a science as such, it is experiential summations derived from perceived reactions to changes, although without being able to clearly declare that the apparent results were as a direct consequence of the changes made on-site, or whether an external factor has played a part, such as a change in the Google algorithm, or an increase in backlinks or any such possibility.
So, in straightforward terms, science based testing should always be more relaible, yet it is far more limiting in its scope.
Creating a Controlled Environment
To gain any validity to any test, it needs to be carried out under as near to a controlled environment as possible.
This means that if we are running tests across five sites, we ideally
need the same things happening at roughly the same time to each site.
For example, if we need to view one site for any reason, and click
through to a certain inner page, then we need to do that same thing on
each of the other sites, so that no one site is assigned any positive
ranking factors, as much as we can possibly control.
This means that while tests are live, we can’t really disclose the sites
that are being tested. If one site was shown during a webinar or
presentation, and several attendees visited that site, the results would
clearly become skewed.
If all sites were shown, it would still be impossible to expect random
visitors to go to each, and to treat each one exactly the same. This
means that many live sites cannot therefore be disclosed.
Test sites take a long time to prepare and need to be established with
equal caution. If Google find out which sites are being used for
testing, there is also a high chance that they would be de-indexed,
thereby destroying the test and the work so far.
So, while we will disclose what we can, we simply cannot disclose everything, especially while it is still live.
Creating The Test Sites
Another area of debate is how the test sites should be configured and what texts should be used on those sites.
As for the number of sites, it is typical to run a test across at least
five sites in unison. This is so that once all sites are indexed and
ranking, the test factor change can be applied to the middle site to see
whether it goes up or down, in relation to the two above and two below.
It is also common for several pages to be created within each site, especially when the test is for internal link structures.
In any case, we always try to keep one site completely unchanged as the
benchmark, but we might use each test site more than once, using
different pages, but always allowing for any rank movement to have
settled before attempting new tests.
this is purely to allow tests to be performed more rapidly, when we have a stock of already ranked test sites.
As for the words, there is discussion about whether it is best to use
lorum ipsum texts, non-words created from english characters (such as
a3pzy6b), or whether to use actual English words, yet excluding any
words that might inadvertantly impact on ranking (such as best, great,
cheap etc.).
For our testing, we are running the gauntlet to some extent as we are using real words from a real book on our test pages.
the thinking here is that the words are clearly in common use.
We took the book shown below in it’s entirity. So, if a word was
repeated, then it appeared in our word pool as many times as in the
book.
In fact, we ended up with a pool of almost 35000 words, of which over 1000 were the letter (or word) ‘A’.
If a word appears 1000 times, then it is safe to assume that it is a
more common word than one that appears only 20 times. By not removing
duplicates, we increase the opportunity for the common words to be
selected in a similar proportion as with the English language in
general.
The book we chose was: Million Dollar Maverick: Forge Your Own Path
to Think Differenly, Act Decisively, and Succeed Quickly by Alan Weiss
PhD.
We thought this was appropriate as we can truly say we have chosen Weiss Words.
What’s Left to Say
In the main part, the above does cover the important parts of how we establish our tests.
We created a spreadsheet that auto-generates test pages, randomly
creating nonsense sentences, and then allows us to insert ranking
factors at the flick of a status switch.
An example is that we can choose to use keywords in the headers, tables,
lists, images, formatting etc. All from within the one sheet.
Whether the words in use are right or wrong is important but as every
page is built equal, and from the same pool of words, then in theory we
should have some parity in our created assets.
So, with all that said, we will close by just stating we will disclose
our results as we have proven them internally. We may or may not
disclose the actual sites, but we will show proof of the changes we
made, the number of pages we used and the variations in rankings that
those tests brought.
If anyone wishes to request a specific test, then we will consider this,
but please do bear in mind, SEO is an ever-evolving target, and what
works today might not work tomorrow.
We hope you found this of interest and that you can now see the effort we put in to testing strategies that we discuss.
If anyone wants to talk deeper about our strategies, then please feel free to reach out to us through our support desk.

Comments
Post a Comment