SQLvariations: SQL Server, a little PowerShell, maybe some Hyper-V Rotating Header Image

SSAS 2008

SQL Server Analysis Services 2008

MacGyver Meets TSQL2sDay

Please be aware that I haven’t had a chance to double-check the technical details of this post before the deadline so if you see an error please feel free to comment about it.  The events happened years ago but it’s funny how history has a way of repeating itself.

Because I have too few hours in each day and I still suck at this blogging thing I am combining my MacGyver Moment and TSQL2sDay posts into this one so here we go.

The Beginning:  Once upon a time in a conference room on a very high floor sat over a dozen technical people trying to quickly fix a server that had begun crashing the day before.  This company was pretty big.  Their data, it was massive.  We only worked with summary rows because of this and yet it was still in numbers that make you stop and shake your head every once in a while.  For instance, one time the daily import process failed just a few percent in and it already had over 400,000 rows cleansed and loaded. 

To set the scene a little: We had developers, data folks, server experts, network experts, a SAN expert, the SAN expert’s boss, application SMEs, business liaisons, directors, and the company’s TAM from Microsoft.  Oh and another SAN expert who also doubled as a Microsoft SQL Server MVP who just happened to specialize in High Availability (specifically clustering).  This conference room had been turned into a War Room and was going to stay that way for as long as it took until the problem was fixed. 

The Problem:  A problem child of an application that already had a reputation for crashing every once in a while had now started crashing every couple of hours.  I don’t know the exact specs of the hardware anymore but it was pretty close to the top of the line.  It had something like 16 LPs and 16 Gigs of RAM backed by nearly 2 dozen disks on a SAN that held over 200 disks.  This server had a twin and based on the error logs that was part of the problem.  It was in a single node failover cluster and all of a sudden it had started logging 15,000 millisecond timeout errors while trying to check on it’s disk resources.  This timeout in turn was causing whichever node was active to freak out and think that it had lost contact with the SAN altogether. 

The Situation: While the overall SAN supporting this server is large by a lot of companies’ standards it was actually just a run of the mill small SAN at this place.  In fact we later ordered up another one that looked just like it (only 2 years faster in every aspect) but at the time we kept on being told how solid this particular SAN was.  There were a lot of moving parts and we were a fresh set of eyes being brought in.  One of the moving parts that we had taken note of was that another server on different disks on the same SAN had recently been rebuilt as a 64 bit server.  We took a look and said I’m sorry but your SAN just can’t support the needs of this SQL Server.  You would have thought I had just told the SAN guys that I had caught them cross-dressing or something like that.  They assured me that “No [little] Windows box” could ever take down their mighty SAN.  Regardless of that we were stuck with that particular SAN for at least the next couple of weeks and needed to come up with something to help.

The Other Problem:  We asked for all the IO stats they could give us but unfortunately there was something about a new version that needed to be installed and needing to wait for an outage window and all that.  Bottom line we were stuck with some total IO numbers over a wide enough time frame that they weren’t all that useful however; being a glorified mathematician I took all the numbers I had and tried to figure out the size of the ‘unknown’ portion by subtracting out the three servers I could get stats for myself.  As it turned out in one particular time interval that was just enough information for me to notice a trend.  It wasn’t much but it was more solid than anything else being offered up (or at least I felt so) and the situation was to say the least a little tense.

The Lame-Brained Idea:  One of the sever experts mentioned that he could “borrow” some RAM from a server that hadn’t gone into production yet and we could stick that in the servers (plural because they are a FC) that were crashing to see if that would fix it.  Given the error message mentioned above I just didn’t think this would work.  I had come up with something that just didn’t sound right even saying it but par for the course I said it anyways.  I asked them to humor me and stick the “borrowed RAM” into a box that wasn’t crashing; that other box in the situation that had recently been rebuilt as 64 bit.  They laughed at my idea and I don’t blame them.  Unless you stepped through the situation in the the exact same order as I did this was like saying let’s just reboot and see if that fixes it (oh wait it does reboot every time it crashes :-) ).  It’s easy to decide to stick the only spare RAM you have into a server that’s crashing; it’s a little bit tougher to make the call to bring down a completely different system and have them stick the RAM in that one.  If you’re wrong you’ve just caused a second outage for no reason. 

Stupid Windows Guys:  That’s what they must have thought when they agreed to give my idea a try.  We had all noticed the crashes had been happening almost like clockwork.  Almost but no one could tie them back to anything else happening on that machine.  As we all know, almost only counts in horse shoes and hand grenades.  And occasionally, wild ass guesses.  Funny thing happened though.  We were able to guesstimate the next crash within 5 to 10 minutes at this point only it didn’t happen this time.

The Hidden Pattern:  Now I’ve always enjoyed learning about the storage engine internals but I sure ain’t no Paul Randal.  I am a quick study and love a good disaster though.  What I had found was that there was cube that was being built every couple of hours on the newly minted 64 bit server and while the system did a good job of still letting you query the old cube while the new cube was being built there would inevitably come a point where something was happening that was causing the SSAS engine to flush ooh-gobs of data back down to the disk.  More data than the total size of the cubes even.  The SAN ended up getting saturated with gigs and gigs of non-stop writes (and I hear with RAID 5 that’s like a bad thing but maybe it’s just a rumor).

The Takeaway:  Make sure you know what tools your SAN admin has available and more specifically find out what they can collect on a continuous basis for weeks on end, you might be surprised by the answer.  The disks your server relies on may be fine but the controller cards accessing it might be getting saturated from something else.  You may need to purchase some additional software or something.  Look for solutions in places they might not make sense unless you hold your head just right.  The error log is your friend.  And above all, just because someone says it’s not possible, don’t completely believe them until they prove it to you.

Tag-You’re-it:  To keep the fun going I’m tagging Peter Schott and Matt Schultz (Matt blame Jorge) to find out what their MacGyver moments have been.  Thank you Geoff Hiten for tagging me.

SSAS Server Cant See Its Own Cubes in SSMS

I’ve got an oddball situation going on here (I know what you’re thinking: Par for the course).  I’ve built out a pair of new Windows Server 2008 R2 \ SQL Server 2008 SP1 CU5 machines.  Both are running great and returning data faster than anything we’ve ever seen before except for one small problem:  One of the servers can’t see it’s own cubes.  The cubes are there and everything.  I can connect to them with BIDS to deploy updates, SSMS to process them, and I can even query the cubes and get results back in Excel.  The server itself just can’t see them.

Here’s what it looks like if you’re logged into the machine directly:

LocalConnection  

And then here’s what it looks like if you connect to that same server from my desktop or any other server:

RemoteConnection

Another weird thing is that the other server that was built out the same way at the same time can see it’s own cubes just fine.  It’s just this one server that can’t see it’s own cubes.

I’ve only heard of one other person who has seen this behavior so the whole point of this blog post is just to see if anyone else is running into this issue.  If so please comment and we can get a connect issue filed if it’s prevalent enough reproduceible.

* Please Note: The names have been changed  to protect the innocent servers under my corrupting influence control.

**  Please take a look at the comments below:
***  As a work-around you can Right-Click on the SSMS icon and select > Run as Administrator to get SSMS to see it’s own cubes:  “If you skip step in the setup where it asked you to add current user to AS server admin role, then you would need to run SSMS as administrator.”

4 Gig limit in SSAS 2008 Dimension Processing

*Please note that this post is a work in progress and I’m still waiting on more answers but I wanted to get the information out there in case anyone else is running into this issue too:

OK that title might be a little bit over the top for what I’m dealing with but if you were in my situation that’s the exact kind of article title that you have been looking for.  I have been trying to migrate a set of cubes from Analysis Services 2000 to SQL Server Analysis Services 2008 and have run into a number of glitches but the most annoying of them was this one…

  • File system error: While attempting to read information from disk, a read error occurred for physical file: ?D:MSSQLMSAS10OLAPTempMSMDCacheRowset_1868_258c_nompm.tmp, logical file: . Errors in the OLAP storage engine: An error occurred while the ‘generated attribute’ attribute of the ‘MyDimension’ dimension from the ‘MyCubes’ database was being processed.

I searched online and saw that there is a KB that describes this same type of error message: http://support.microsoft.com/kb/970184/

…  And went on to say ” The fix for this issue was first released in Cumulative Update 2 for SQL Server 2008 Service Pack 1”. 970315  (http://support.microsoft.com/kb/970315/ )

After downloading and installing Cumulative update package 5 for SQL Server 2008 Service Pack 1 I once again tried to Process and came up with the exact same error message.

Unfortunately, I’ve now installed CU5 for SP1 on SQL 2008 and I was still getting that same error message.  Here’s what’s going on.  According to the SQL CSS support person that I spoke to, this message is actually telling me that I’ve run out of space in the temp file that SSAS has created for my cube while I was trying to process it.  My problems are (well, problesms as it relates to this cube migration, not all my problems in general :-) ) is that A) it’s a really ugly way of telling me that and B) my understanding is this is only supposed to happen to string stores when the Key is a string.  This dimension that I am working with has an int for a key, so apparently it’s also possible to happen for the Attribute Name, not just the Attribute Key.

As it turns out, I shouldn’t have run into this limitation in the first place though if settings had been kept like they were in AS2000.  What had happened was that when I used the Cube Migration Wizard it had decided to process my dimensions “ByTable”.  To correct this what I needed to do was go into every dimension that gets processed in the cube and set it to process “ByAttribute”.

After that I obviously had to build, deploy and process my cubes but with that change to each and every dimension in my cube I was finally able to start processing all the way through again.  There are a few other issues that that I’ve run in to but I will keep this post short and cover those in other blogs.  One thing that I wi ll note before I go though is that I have noticed a few people on MSDN running into this issue with string KeyColumns but in my case I was using an interger for my KeyColumn so you can get that error with an INT too.

From a processing perspective, it is a best practice to assign a numeric source field to the KeyColumns property rather than a string property.  Not only can this reduce processing time, in some scenarios it can also reduce the size of the dimension. This is especially true for attributes that have a large number of members, i.e., greater than 1 million members.”

You can read more about what this change means in the help files and get more info on increasing performance in the ‘performance guide’ (you’ll want to skip ahead to page 53).