How to Fix the Most Common Advanced SEO Issues – Part 2: URL Parameters

by Alan Bleiweiss  |  Published 9:00 AM, Thu June 26, 2014

AUTHOR NOTE – This is part two of a mega-post on what the most common advanced issues are, understanding why advanced SEO issues need more attention than most SEOs give them, and how to fix those issues.


In Part 1 of this series, I gave a basic overview of how my cumulative site auditing experience has helped me realize that even though every site is unique and thus has unique problems, most sites—from small mom & pop eCommerce sites all the way up to global reach sites with more than 100,000,000 pages—share a number of advanced SEO problems.

Today, I want to start showing what to look for, how to determine the severity of problems, and how to go about the work of resolving them, starting with URL parameters.

URL Parameter Mess

This is Not A Developer Training Course

Although I’m going to talk about how to resolve big issues, this is not a Web development post. It’s for marketers and site managers, some of whom happen to be developers.

Because the majority of people reading this are marketers and managers, I’m going to give step-by-step task instructions. However, I won’t be explaining that if, for example, you need to implement 301 server-level redirects, you need to open up your .htaccess file, or even how to change that file.

You can be sure, though, that if I do say “to fix this issue, change all the dead URLs to “301″ “permanently moved” redirects at the server level, that’s a best practice way to fix a specific problem. HOW you do that work will depend on the individual developer or team you’ve got, and the methods they use to execute that task.

Many Ways to Skin An SEO Cat – Some Better Than Others

Sometimes, there is only one way to achieve an SEO goal. Other times, there may be five ways, one or more of which would work. And if there are multiple ways, sometimes they’re all valid. Yet sometimes one or another may be the better option.

When appropriate in the constants of a post of this nature, I will warn that multiple ways exist, and offer insights into why some might be better than others for SEO, or even general business purposes.

Lastly, SEO is as much art as science. I, and many others, have been saying that for years. Combine that with the reality that Google is an unstable multi-headed beast and what we’re left with is that just because I recommend something does not mean completing that task is going to be the end-all for a given issue. Specific circumstances always need to be considered, including the ever-changing algorithmic landscape.

Having gotten all those warnings and clarifications out of the way, let’s dive in.

Confused URL Parameter Processing – Stop Already

Okay, so not every site uses URL parameters (also referred to as “variables”). Some sites actually have clean URLs. In my experience, I’ve been hired to perform audits on way too many sites that use URL parameters. And it’s ugly just about every time. Butt ugly.

Using GWT URL Parameter Tool - Warning

URL parameters, for those not familiar with the term, refers to a method of passing information from the Web page to the Web server in order to generate content unique to that page.

For example, if you’ve got a shopping site, and one of the categories on your shopping site is jewelry, you can communicate to the server, via the unique Web address for that category page, “show all the jewelry on this specific page.”

The URL for that page might be http://yourdomain.com/?category=jewelry

When that page is processed, a script inside the page’s code can then look at that URL and see, “Oh, for this page, the category is ‘jewelry,’ so only show jewelry here. Don’t show hats or blouses or pants.”

If we lived in a perfect world, this programing method would be totally valid for Web developers because everything in the world would be standardized. One result would be search engines having the capacity to know exactly what every developer intended to do.

In the real world, things get ugly fast when it comes to URL parameters.

What I wish Google would do is communicate the severity of the issue more clearly up front.

If your site has serious problems, URL parameter settings can make the problem much worse

URL Parameters Going Off the Rails

The first problem we have with URL parameters is one developer might refer to that parameter type as “category,” another might refer to it as “cat,” and a third might just refer to it as “c.” So instead of one standardized classification within all eCommerce URLs, we end up with a mess.

http://yourdomain.com/?category=jewelry

http://yourdomain.com/?cat=jewelry

http://yourdomain.com/?c=jewelry

http://yourdomain.com/?cat=jewelry&subc=34&sort=HightoLow&r=top

When that happens, it starts the process of confusing the topical understanding of content.

What happens when you have five parameters, or fifty? If you do, and you multiply those by the number of Web developers out there creating countless eCommerce websites, the potential for mass confused naming becomes exponentially worse.

It can get downright painful.

http://yourdomain.com/?c=23&sc=9872&p=111&ms=high&ss=pop&co=green&sp=&qv=100

I’ve seen URLs with upwards of seventy five parameters in the URL string.

Oftentimes, a developer made some massive mistake along the way and decided it was okay on SOME of the site’s URLs to pass EMPTY URL parameter fields, while on other URLs on that very same site, to not pass any of those very same parameters if they’re blank.

How is it even half-reasonable, then, to expect search engines to understand what you’re doing on the site?

Google Webmaster Tools To The Rescue?

Google Webmaster Tools (GWT) is NOT your savior, your hero, or your fixer.

What GWT can do is HELP deal with a vastly diverse Web. Only to the degree that Google’s multiple algorithm system can figure everything out, though. And only if you don’t send mixed signals.

If you recall that screenshot I posted above where in GWT, there’s a warning that using the URL Parameter Tool incorrectly could result in many pages disappearing from search.

What Google fails to warn users about in that statement are bigger pitfalls:

Missing Out On Topic Signals

When you use URL parameters like that, you are eliminating one opportunity to pass along important keyword signals. Sure, you can still seed page titles, H1 tags, content, and link anchors with keywords. Yet advanced SEO best practices are based on the cumulative value of reinforcing signals with other signals. So that’s a loss if your URL variables look like the examples I gave above.

Google has no way to know with certainty that ?c= means “the top grouping here is “category.” And “c=23″ can’t easily be identified as “all of this group is about jewelry.”

And even if you get all signals right in other areas, if you make a poor choice in how you want Google to treat any individual parameter, you’re only going to foul things up even more.

Generating Duplicate Content

The other really big problem with using URL parameters, even if you set them up in GWT, is duplicate content confusion on a massive, grotesque scale can occur.

Important Pages Vs. Total Indexed

I’m not going to go into the entire concept of faceted navigation here—that topic alone could fill a book the length of a Harry Potter novel. Instead, I will cover the bigger picture factors, starting with using URL parameters to pass faceted navigation options from page to page, and how this is toxic to SEO for all but a rare few highly skilled developers who spend many hours getting super-granular with cross-signal SEO.

Faceted navigation allows site visitors to discover content through many pathways. This is a good thing when done with care, especially when you have many products, and loading them all at once can overwhelm the visitor.

Faceted Navigation Out Of control

When you have complex, advanced SEO problems, the moment you allow Google to crawl and make their own indexing decisions about all of those options, you expose your site to big problems. Panda-scale problems.

Mass Content Duplication is often a Panda Problem

The Solution to URL Parameter Problems

Yeah, no.

There’s no single solution if you’ve got URL parameter problems.

Like every other advanced SEO problem, it depends on how polluted the multi-signal system is you’ve created (or inherited).

The core concept, though, is that you need to craft a plan to declutter the noise you’re presenting to search engines. When you do that properly, you end up with better signals.

Reducing Duplication Can Improve Search Visibility

As the above charts for one client show, if you reduce the noise from multi-faceted duplication properly, over time, you have much better potential for increasing the quality of the signals that remain in place, and that, in turn, can bring in more organic traffic. You can’t rush it in all situations though. In the case of the above site, this was a long-haul, step-by-step rollout.

If your site is seriously broken from an advanced perspective, you need to be extremely careful about attempting to take corrective action.

There are, however, a few standard tasks that need to be done depending on just a couple factors.

Deployment Life Cycle – How Fast To Roll Out URL Parameter Changes

Scenario 1: Your organic visibility has caved in.

If you previously had very good or maybe even stellar visibility with Google, and then your site fell—suddenly or gradually—to the point where you’ve now only got a small fraction of that, this may very well be an opportunity to fix URL parameter problems, aggressively and in a short time frame.

It’s like, “Well, we lost 70% or 90% of our Google visits, so we might as well bite the bullet and get this corrected right away.”

In this scenario, the sooner you make the necessary changes, the sooner you’ll be on the road to recovery from the fall.

Also, in this scenario, you may want to skip the steps below regarding how to prioritize the cuts and changes. Instead, you can jump right over to the “let’s kill those parameters dead” section.

Scenario 2: Your organic visibility isn’t horrible, and you still get at least some revenue from that channel.

In this scenario, if you change too many signals too quickly, you can literally shock your site into even further losses, at least short- to mid-term. And it could take six months to a year after that for you to recover.

So if your site is a duplicate content / random indexation nightmare, or if you rely heavily on what remaining traffic you get from Google, you’ll need to roll out the changes much more slowly.

If this is your scenario (or if you’re not sure it is, and you just want to be cautious, don’t skip the following steps regarding prioritizing action!

URL Parameter Fixes

Step 1: Know The Bigger Picture

Yes, this post is about correcting URL parameter issues. No, it’s not a magic-bullet post. If you don’t understand the bigger picture—at least most of the other advanced SEO problems that are simultaneously harming your site—making just these corrections may NOT get you the kind of results you need long term.

So before you begin working on cleaning up the URL parameter nightmare, get a grasp of that bigger picture. Understand that while you’re working on this issue, you would be better off working on other big problems simultaneously, or having different team members / outside consultants / agencies working on those.

Not sure how to evaluate that bigger picture? You’d be very wise to perform a proper strategic audit. Need a quick primer on several core factors to look at? Go view my slide deck on how to perform SEO audits.

Step 2: Establish a Consolidation Plan

Let it be stated for the record: I love Google’s Maile Ohye. There. Now that I’ve said it “out loud,” here’s why: She’s brilliant. Absolutely brilliant.

Not only is she brilliant, she routinely comes out with new official Google content to help webmasters understand complex technical SEO. Since she’s a Developer Programs Tech Lead at Google, and routinely works with their Search and Webmaster Tools teams, she knows a thing or 500 about SEO best practices from Google’s perspective.

Here’s what Maile has to say about the duplicate content problem:

“Google’s goal is to crawl your site as efficiently as possible. Crawling and indexing pages with identical content is an inefficient use of our resources. It can limit the number of pages we can crawl on your site, and duplicate content in our index can hinder your pages’ performance in our search results.”

In other words, if a visitor can get to a product through six variations of navigation paths (faceted navigation), the best practice for SEO is to eliminate, block, redirect, or canonicalize EVERY path to a product except one.

While you can leave the site wide open and let Google decide what to crawl, what to eliminate from indexation, and what to score as more important within all those variations, that’s not in your best interests.

Let’s say you have a dozen URL parameters specifically related to faceted navigation and sorting/filtering options.

By understanding your own data, you can create a spreadsheet that shows which version of content results is more valuable.

URL Parameters Out Of Control

Look at that mess. How can you have a TOTAL of 10,000 products, and end up with 243,000 URLs with the parameter “Style-Type” in them? Or 227,500 where the URL specifies how many products are displayed per page?

Think about that for a moment.

This site has links in faceted navigation that let visitors choose to see just those products that match one of several style types (cocktail dresses, for example).

Except the site also lets people choose style type AND fabric type.

Or style type, and show only 10 products per page.

Or style type, and fabric type, and sorted high to low by price.

This is why Google chokes. Their system is like, “Wait, this is all just showing the same products over and over again. It’s killing our resources, so let’s assume all the URLs we’re not going to crawl are just the same regurgitated mess.”

Instantly, they’ve abandoned the crawl. What if the pages they didn’t crawl this time were changed? Updated? Added? #OUCH

Or, “Wait, all of these pages may be important for super-long-tail search, so let’s index them. But wait—when someone searches for “designer cocktail dresses,” which of all those variations of sorting and refinement that we HAVE indexed do we bother showing people?”

That’s leaving it up to Google to decide what to show people in a search. Don’t just leave it to Google to decide!

Quick Fixes Are Sometimes Not Fixes At All

If you’re a “magic bullet” SEO, you’re thinking I can just slap in a one-size-fits-all solution. Whether it be canonical tags, or a reference in the robots.txt file, or maybe a meta robots noindex,follow tag, there is no single solution that works properly in every case. And in a site with many parameters, you may need a combination of tactics.

Remember my rule: If you use a magic bullet for your SEO, you’re not thinking bigger picture, and you’re going to cause me to see #AsshatSEO when I get hired to audit your site.

Google does NOT rely on just any one of those methods as a directive. Every one is just “one more signal to consider along with many other signals.”

Even if you DO use one of those signal or “hint” methods, search engines have a very complex process of determining how trustworthy those signals are. And if other signals conflict with the ones you use, forget about it. The nightmare begins. And if you have a lot of competitors, it just gets worse.

It’s Not Just Based On Volume Of Content Per Parameter

Once you lay out a spreadsheet per the example above, you need to go even further. Determine which parameters are the most valuable/important to the business, and which combinations are the most helpful for your prospective or existing customers.

Maybe for one site, the most popular products are “cocktail dresses under $200.” If that’s your primary target market, you need to understand that. For another site, it might be “cotton dresses.”

In either case, it is most assuredly not going to be “cocktail dresses under $200 that are popular and happen to be editor’s picks, displayed 50 to a page.”

Look at those things. Where’s the real opportunity for the most visibility across the competitive landscape? That’s a factor to consider as well.

Prioritize Which Parameters Can Be Eliminated First

Once you’ve got the data, you can see where at least some parameters are just “regurgitating the same thing over and over, with little to no “important” value to be indexed in search engines.

With this knowledge, it’s pretty straightforward to decide which of these parameters can be killed immediately. Others, not so much. It takes real effort to make some decisions. So take your time, and get other people involved from the company if that will help.

Step 3: Kill Those Suckers Dead

Now the real work begins. One size solution does NOT fit all URL parameter changes. Some will be perfectly suited for slapping into the robots.txt file. Some may just need a noindex,nofollow tag. Others would be better suited to use a 301 redirect, and maybe throw in a canonical tag for good measure. Others may need a combination of tactics.

Generally speaking, it’s a case-by-case assessment.

Kill Small-Scale Parameters Fast and Hard

The smaller the impact, the more assuredly you can simply block indexation entirely. For example, if you’ve got 500,000 products, and 6 million pages indexed, you know it’s a massive mess. Yet within your 23 URL parameters, maybe one of them only has 5,000 pages indexed. This is more often the case with “popular” or “editor’s pick” type parameters.

If it’s a bottom-feeder-type parameter, go ahead and kill it off. In most cases, it was never valuable or even helpful for search purposes.

In this scenario, I would typically (yes, there are always exceptions) opt for a meta robots noindex,nofollow tag on all pages that have that parameter in the URL.

Or you could include it in your robots.txt file.

But if those pages are already indexed, it’s better to let search engines recrawl them so their system can see that new tag, and not have to guess whether you intended for all of those to become noindexed.

And by all means, go into GWT, and for that parameter, confirm “Yeah, we don’t want you to index these anymore.” Just don’t rely ENTIRELY on that GWT setting. First, Google isn’t the only search engine. And second, remember—even the URL Parameter Tool is just one signal among many.

Bigger-Impact Parameters – Options Depend On Situations

Gone are the days where one method or combination of methods is the only “best practice” method for all sites.

Canonical Tags and Noindex,Nofollow Option

While canonical tags may help, they’re just one signal. You can use them in combination with noindex, nofollow meta robots tags.

In fact, for parameters that have a much bigger footprint, using a noindex,nofollow tag on each page and the canonical tag is often the best solution.

Google would almost always prefer this method because it allows them to continue to find that content, and helps them better understand your intent.

Be very careful though. I’ve audited sites that got their programming code wrong—canonical tags were pointing to pages that had noindex,nofollow tags on them. Yes, that really happens.

Sometimes 301 Redirects are Better

If, for some reason, you were getting a lot of organic traffic to those pages, there’s ranking value in them, so a 301 redirect may be more helpful because it passes along that value.

And by adding in canonical tags to 301 Redirects, you can double up on the signal, indicating those other pages really are the ones you want indexed, as an insurance step.

One reason you may want to avoid the 301 path is, if you do 301 redirects today, then in two months, you decide it’s time to switch to a completely different URL syntax, you could end up with two or even three-hopping redirects. That’s ugly, painful, and further reduces crawl efficiency.

Another reason to avoid 301s is this: You already have 500,000 301 redirects in place. Now you’re going to add another 855,000. At some point, that alone can kill server resources. The .htaccess file implications alone are enormous. And if you’re on a shared hosting plan, that could be a serious problem.

Yet another factor here is that if you have overall crawl efficiency problems, and millions of pages, Google might need months to crawl all of those URLs.

The Robots.txt Option

Yet other situations may be best suited for blocking them entirely in your robots.txt file.

If those pages were always very low quality, or they were the fifth, sixth, or tenth variation of pages on your site displaying the same content over and over, it can sometimes help to just cut them out of the system entirely.

In most situations for large-scale changes, if too many links from other sites point to those pages, or other off-site signals say, “That content in that combination is what people want,” Google could ignore the robots.txt file. And a mass jump in 404 pages can cause other “ghost” signals related to quality and trust.

Which means that sometimes a 410 code is even better because there’s less ambiguity about it. Instead of “not found” (maybe still exists, just can’t find it), it’s, “Yeah, they’re gone—history—don’t exist anymore.”

Google’s John Mueller has stated using 410s is a viable option:

“We generally treat 404 the same as 410, with a tiny difference in that 410 URLs usually don’t need to be confirmed by recrawling, so they end up being removed from the index a tiny bit faster. In practice, the difference is not critical, but if you have the ability to use a 410 for content that’s really removed, that’s a good practice.”

If you don’t get the entry in the robots.txt file exactly right, search engines won’t even be able to understand it. Screw up that file, and it’s useless. So use this method with extreme caution.

Incremental Action May be Critical

The bigger the mess, and the more complex the problem, the more slowly I recommend you go when it comes to making these changes.

If the problems are enormous, you may be better off making changes to every other aspect of your toxic site footprint (page processing speeds, toxic inbound link profile, horrific over-optimization keyword stuffing, etc.) before you even begin eliminating URL-parameter-based content.

Once you do begin the process, if you have enough big-volume parameters that need to go, you may be wise to tackle them one at a time. In this scenario, you’d make the changes to one parameter, then sit and wait for search engines to catch up to the change. Depending on how many URLs are involved, it could literally take weeks, months or yes, I’ve seen it take Google up to two years just to re-crawl all the URLs involved.

Of course, two years is a worst-case scenario. At the very least, I’d give it 30 to 60 days in between each big change of URL indexing.

Step 4 – Address Additional Reinforcing Signals

If you have links internally within your site pointing to parameter URLs you’re killing off, determine whether you need to change those, or maybe assign a nofollow attribute to those links.

Update your sitemap xml files as well. You don’t want to include URLs in there that you are killing off. It’s a conflicting signal.

If you’ve got external links pointing to those URLs, do outreach for link reclamation as well so you can reaffirm the new message.

Consider Faceted Navigation Changes

As you do the URL parameter cleanup, take a serious look at that faceted navigation. Maybe it’s time to reevaluate your user experience as well. Maybe some of those parameters are based on filter and sort methods that are just annoyingly confusing to users, or overwhelming them with less-than-vital choices. In some cases, don’t just kill off the parameter for search engines—eliminate it for end users as well.

Testing and Monitoring Advanced SEO Fix Life Cycles Is Critical

As is the case with every other advanced SEO fix, when working to reduce URL parameter duplication, it will be vital to test changes, then monitor data.

Make the changes on a development server that’s blocked from search engines (put it behind a firewall!). Thoroughly test the change. Make sure something that was important for indexing or SEO didn’t get killed in the process. Make sure user experience wasn’t harmed. Test, test, test.

Then after you launch, test again. Test for unintended harm. Test for accuracy of the kill process. Test, test, test!

Monitor Google Webmaster Tools, Google Analytics, Screaming Frog, and whatever other program or resource you rely upon related to site health specific to URL indexation, redirects, and dead ends.

A Final Word

As much as I’ve provided very important information here, and offered actionable steps you can take to fix one of the biggest advanced SEO problems I see in my audit work, this is not a comprehensive training document. Many of the decisions you may need to make sometimes really should be left to seasoned experts. Yet I’m confident this is a good primer to at least get you going in the right direction.

And most important, I sincerely hope it will open your eyes to the fact that the URL parameter issue is much more complex than most resources suggest.

Let me know what you think in the comments. And let me know which advanced SEO problem topics you’d like me to cover in upcoming posts in this series!

Stay tuned for How to Fix the Most Common Advanced SEO Issues – Part 3!

About 

January 2014 marks Alan's 20th year as an Internet Marketing professional. Providing SEO solutions to clients since 2001, Alan specializes in forensic SEO audits and related consulting services to select clients around the world. Visit his site for more information on how he might be able to help you.

Comments
  • Nick LeRoy

    Great post Alan. This is something I too look at when auditing extremely large dynamically driven websites. You mention “In this scenario, I would typically (yes, there are always exceptions) opt for a meta robots noindex,nofollow tag on all pages that have that parameter in the URL.” What’s your thought on noindex,follow assuming there either is (or future possibility) of the page acquiring link equity. I’m assuming your concern would be managing crawl budget?

  • billslawski

    There’s nothing quite like that Google message in Google Webmaster Tools that say that you “have too many URLs.”

    I actually like whittling down the content and URLs on a site to a manageable level. Often that means making sure that “email a friend” or refer a friend or “compare products” pages or https pages that don’t need to be https, aren’t being crawled and indexed.

    When you crawl a site and discover instead of the 30,000 URL that a “site” search on Google estimates you have, you really only have 3,000 products and another 400 category pages, it becomes a challenge to come closer to having Google index only the pages that you really want indexed. When Google reports on over a million URLs on your site, and you start looking at what they’ve listed, and lots include Session IDs and Tracking IDs, you know at that point that you can make those numbers shrink.

    It’s not uncommon to have pagination pages where pagination markup can be set to help limit how much duplicated content Google might see on your site.

    It’s not unusual for sites to have been set up so that all of the crawlable and indexable pagination pages in a series have had canonical link elements added to them so that only the first page in that series is used for every page of the series. That can mean that all of the other product pages on the other pages of the series don’t have a click path that a crawler can follow to capture them. It’s even more fun when new search engine friendly URLs are created for canonical link elements on the pages of a site, but the pages at those URLs don’t actually exist.

    I’m brushing the surface, but it can be a lot of fun to get a page with 300,000 estimated URLs at Google to go down to 15,000 or less. Even better is when the marketing manager you’re working with tells you that he just got out of a June Board meeting where he got to tell the board that the SEO you’ve done resulted in more sales on their site in the past 6 months than in the whole prior year.

    • http://alanbleiweiss.com/ alanbleiweiss

      Thanks for commenting Bill. Glad you bring up some of the other common causes – and the whole “we’ll just canonicalize all of these to the 1st page” issue is a real problem out on the web.

  • http://www.andreascarpetta.com/ gareth jax

    Great post as usual Alan! I have a question: there is an increasing number of websites that use ajax technology to filter product listings using cute widgets and sliders (just eye candy); how do you usually handle them ?

    • http://alanbleiweiss.com/ alanbleiweiss

      Gareth,

      AJAX is not a proper way to block content from search engines, nor is it a proper way to display content you want search engines to find. The reason it’s neither is that Google has gotten more and more capable of understanding AJAX, yet it’s not an ideal capability at this point, and too often the content within AJAX ends up being found, and indexed, except it’s orphan content devoid of brand identity, and devoid of primary site navigation.

      There ARE ways to help search engines make use of AJAX, however in most cases it’s executed poorly, just making things worse. As a result, I encourage site owners to avoid attempting it unless they do extensive, costly testing on a big enough scale.