|
SharePoint 2007 Diary
System.OutOfMemoryException in SiteData During Search Crawl
Developer Notes Series
This is a problem that has
been eluding me for a while. Well two if not three months. Each time we ran a search
crawl we would get a outOfMemoryException on certain pages. And of course, for the
longest time we could not resolve this issue.
This is a log of what happened
and what I and a colleage, Tim Clark, think is the issue. The reason I am telling
the entire story is that the reader may pickup on several other SharePoint 2007
pointers.
For starters we used to get
the System.OutOfMemoryException only on our production servers but never on our
development servers which were a bunch of virtual servers. This issue was quickly
resolved. Once glance at the crawl logs and I saw that most pages were not being
crawled because the web server was timing out in serving them. Once I increased
the timeout value we started getting the exceptions. So now the playing field was
even. I could get the error in all our environments.
Then I did what everyone does.
Start googling for this error. I got just one hit on some SharePoint forum with
no responses to the persons question. So that did not help.
So then the analysis started.
We were getting the exception when the crawler would hit certain very large lists.
These lists were large for two reasons, large number of items, i.e, 8000 - 15,000
items and these items/documents were large in size. The next step, therefore, was
to break these lists into smaller sizes. Using MS guidelines that lists should have
fewer than 2000 items (even though these guidelines are for performance not for
OutOfMemoryExceptions) I started cutting the big lists to smaller lists of 2000
items each. No help. I still got the errors.
I then cut the lists down
to just 400 items each and that too did not help. Now I was in a bit of a bind.
So the next conclusion was that the error was occuring when the indexer was going
through the documents and images as these were large files.
The one problem I had while
looking at the ULS verbose logs was that while I was seeing the exception being
thrown, I could not tell which server was throwing it. Was it the machine that was
crawling (the indexer) or the web server that was being crawler? what exactly is
the data that is returned to the indexer?
These questions led to getting
some support from Microsoft directly and some more detailed logging but this time
of every web call being made to IIS. Now I got some intersting data. The exception
was being thrown by the web server not the indexer. The exception was thrown by
SiteData.GetContent().
Now I had something to go
with. In the mean time with the help of Microsoft we managed to debug the w3wp process
and send Microsoft the dump file. The finding were heap fragmentation. Hmmmm, now
why would that happen.
We tried various tricks and
configuration settings to choke down the crawler so the number of requests being
made to the machine being crawled is reduced. Thinking this my prevent heap fragmentation.
The theory of the moment was that for whatever reason the .Net framework or the
SharePoint ISAPI DLL was not releasing memory in a timely manner.
While all this was going on
I was also reading up as much as I could on how most SharePoint farms were setup.
One distinct difference between ours and everyone elses or most others was that
we were on a 32 bit environment. Each time I raised this issue with Microsoft and
with all the SharePoint folks I worked with I was told that there is no reason to
believe that this error is a result of using a 32 bit environment vs a 64. I disagreed
but more about that later.
So next I started loading
SharePoint assmeblies into reflector to see whats in them. More specifically what
is going on in SiteData.GetContent(). The code looked quite normal.
Just opens the
SPWeb and then gets data for the list. One interesting observation was that it returned
an XML with data on the list. So none of the images, documents etc were being returned
to the indexer. So the memory fragmentation was not occuring because of handling
large image files or documents.
I then created a web part
that would invoke SiteData.GetContent() for the list that was giving the exception.
This is when the findings started to become interesting. Tim Clark, who does not
have a blog so I cannot point to
it, started helping me by further analyzing this
problem.
I wanted to see what would
happen if we had an application that made the same calls the crawler did or something
similar. So for starters he created a cosnole application that went through each
list in each site making the same calls that SiteData.GetContent() does. Hmmmm,
no OutOfMemoryException. Now this was interesting.
Tim then made some interesting
observations.
- If we
did an IIS reset and invoked a webpart with a call to SiteData.GetContent() on one
of the bad lists we would not get the OutOfMemoryException.
- If after an IIS reset we ran
the crawler, we got the OutOfMemoryException. But from this point loading the webpart
with a call to SiteData.GetContent() would result in the OutOfMemoryException. However, now calling the console application that would make the same calls would
not.
This was interesting. Also
looking at perfmon it appeared that we were getting the OutOfMemoryException even when the machine had plenty of memory to spare.
So what was causing this to happen.
The analysis of the w3wp dump file by Microsoft said that there was a heap fragmentation.
It appears that this was specific to the w3wp worker thread. Further analysis of
the data from the SiteData.GetContent() calls showed that this method retruned huge
XML files. They were huge for the lists other than the lists in question. But not
because these lists had images or documents. They were huge because the lists were
of ContentTypes that had a large number of of Site Columns and with large amounts
of data in them. Some of the XMLs were 15 to 20 MB.
Our guess (and its just a guess)
is that for some reason the SharePoint ISAPI DLL or some other process is not releasing/Disposing
objects in time. The reason is that the same or equivalent code can be called from
a console application where we, not SharePoint get an SPWeb object, get all sites,
lists etc and Dispose the objects and get no OutOfMemoryException error.
All this
is intersting but what can one do to solve this? Well there are two possible options
so far:
- Remember the 32 Vs. 64 bit question I raised? My guess is that with 64
bit the additional RAM available may make it harder to fragment the memory as much.
This is not fixing
the problem, its simply making the problem harder to reach. Incidently
we received a document from Microsoft now recommending a 64 bit environment. So
we went out and purchased a few cheap 64 bit desktops, setup our farm and ran a
search crawl. No OutOfMemoryExceptions. Wooo hooooo!
- Since the memory/heap fragmentation
is occuring only for the w3wp process I figured why not recycle the worker process
in the application pool. After some initial adjustment we set the process to recycle
when it reached a certain amount of physical memory or virtual memory. We ran the
search crawl. No OutOfMemoryExceptions.
What is the cause of the exception? I still
do not know. But I think so far we have found two work arounds. Also, as I mentioned in a early post, look into your data. One of the reasons we were getting this error
is that we had lists that contained huge amounts of data. Lists are meant to be
lists not essays or books. If that is the case
put your data in a document.
|