+1.916.577.1977 | Downloads | Buy | Register | Login
 Search  
Thursday, May 22, 2008
Search Blogs
 

Available Blogs
 

Previous Blogs
 

Technorati
 
More blogs about coversant.

About Coversant
 

How to Build Scalable .NET Server Applications: Memory Management
 
Location: BlogsStarin' at the Wall    
Posted by: JD Conley 6/26/2006 9:24 PM

I'll get this out of the way from the start. This series of blogs will have nothing to do with ASP.NET or web services. However, if you plan on writing you own implementation of IIS in managed code this would probably be a good place to start. :) I also won't be providing very many code examples, as I'd be flogged by our intellectual property lawyers. You will not be able to copy and paste and create your own scalable server. However, I hope to provide enough insight so you can avoid a big list of gotchas we have had to figure out the hard way. This is one piece of a huge puzzle, memory management. Yes, you do have to think about that in .NET, at least if you want to build a large scale application.

For those who don't already know, SoapBox Server is a part of our SoapBox Collaboration Platform that supports the XMPP protocol as well most of the interesting JEP extensions. At the core of SoapBox Server is a highly efficient Socket server and thread machine capable of scaling into the hundreds of thousands of simultaneous users, and it's built 100% on .NET (C# now, but used to be VB).

SoapBox Server is the first multithreaded Socket based server application I've had the pleasure of working on. During the course of building the SoapBox Server into the extremely scalable and reliable system it is today I've learned a few things (as has the rest of the team, I hope). Thanks to Chris (who already had tons of experience with such things in Win32/C++), a few bloggers out there, some books, customers finding very interesting bugs, Windbg with Son of Strike, oh and Starbucks, I'd say I'm pretty well versed in the land of building scalable server applications. I'm no Jeff Richter, mind you, but I feel I have now learned enough to at least speak intelligently about it.

In that spirit I'd like to share the fruits of our tuning and debugging work, which, if history repeats itself, will continue to evolve as we begin work on our next major revision of the product. First, I'd like to repeat something I said a couple paragraphs ago, SoapBox now scales to hundreds of thousands of simultaneous connections with a single piece of server hardware. Think about that for a second. A user brings up an IM client, connects to SoapBox Server, and then holds that connection open until they Log Out. Repeat hundreds of thousands of times. This is no simple task. The .NET CLR does not provide a magic "Process.Scalable = true" property. We have invested hundreds of hours into tuning (maybe thousands) over the life of the server on classes of hardware varying from single processor laptops to 16-way Itanium2 systems with 64GB RAM. We've been through four distinct processing models as well as quite a few iterative improvements on our Socket interaction layer. Basically we have ran the server under a bunch of different profilers under many scenarios, found slow bits of code, and fixed them. But I'm not going to talk about profiling and performance tuning; perhaps another time. I'm going to talk about memory and scalable applications.

Every time your application creates a new Socket, Windows pulls memory from it's Nonpaged Kernel memory, which is simply physical memory that is reserved by the kernel and will never be paged out to disk. This block of memory has a finite limit and the kernel picks the limit based on the amount of phsyical RAM available to it. I don't know the exact algorithm, but with 4GB RAM it's usually somewhere around 150,000 TCP Socket connections, give or take. Want to see this in action? Simply create a loop that instantiates sockets. It will stop working eventually with a SocketException telling you there isn't enough buffer space. On top of this hard kernel level limitation, you also have to worry about how much memory each concurrent connection uses in your own application. In SoapBox we store a lot of information about each connection in memory in order to improve performance and decrease our IO operations. This includes things like the user's contact list, their last presence (available, away, busy, etc), authorization information, culture information, user directory information, etc. If we didn't hold this in memory we'd have to hit a file, database, or some other out of process persistent store for the information every time we needed it. Being IO bound is no fun. Believe me, we started out that way.

However, because of our extensive caching, SoapBox Server 2005 can only reliably handle about 20,000 simultaneous connections on the beefiest of 32 bit hardware (on 64 bit it's much, much, much higher -- I also have to admit we haven't stress tested the 2007 build on 32 bit hardware, it would probably be much higher now). It doesn't matter if you have 64GB RAM and 16 32bit processors, it we can still only handle 20,000 connections. Why, you ask? Well, it's because of the 2GB (well, really 3GB with a boot.ini switch) virtual memory limit per process in 32bit Windows. Without delving into managing your own memory your process is only allowed up to 3GB to play with. Typically, we use that up, or rather, .NET thinks we use it up, somewhere between 20,000 and 30,000 connections. Now why would I say ".NET thinks we use it up?" Story time!

A little over a year ago one of our customers kept running into a very bad situation. As evidenced by the Event Log, SoapBox Server was crashing (insert shock and awe here). It was an irregular occurance, but it did happen. However, we did no take this lightly. This customer was running about 2,500 simultaneous connections on a Dual Xeon with Hyperthreading and 4GB ram and the /3GB switch set. It was plenty of hardware for the job, and probably overkill. However, the service was still crashing. We set them up with the Debugging Tools For Windows and had them startup the process to wait for a crash (another blog we'll have to write some day). After a few tries we got a dump with some useful information in it. The result? We were out of memory, sort of.

In .NET when you call any socket operation and pass it a buffer, whether it be a send or receive, synchronous or asyncronous, it takes that buffer and pins it before giving it to the Winsock API's. Pinning, in a nutshell, is taking a .NET data structure and telling the .NET CLR memory manager not to move it, until it is explicitly un-pinned. The memory manager in the CLR is smart. As you allocate and deallocate memory it is constantly defragmenting it for you so the overall memory footprint is lower. There are quite a few really good/long/complicated articles on how this works so I won't bore you. However, pinning throws a wrench in this and the memory manager isn't quite smart enough to deal with it well (though it has gotten a lot better in 2.0). Basically, that buffer you want to put on the socket cannot move in memory (physically -- in terms of you virtual memory space) from the time the socket IO operation begins until it ends. If you look at the Winsock2 API's this is obvious, since the buffer is passed as a pointer. Anybody who's built this type of application in Winsock2 is probably saying "DUH!". I'd consider this a very leaky abstraction. Due to this behavior, it is quite easy to write a socket application in .NET that runs out of memory.

Back to the story! Not only were we out of memory, but the there was only about 200MB worth of data structures in the heap. For those of you like me that use calc.exe for all your basic math let me figure that out for you, 200MB > 3GB. Uhh, say what? How the heck were we out of memory? Well, we ran into the shortfall of pinning and memory fragmentation. The cause of this was a small number of small pinned buffers, in our case 2KB each, that were high enough in the heap to cause fragmentation spanning over 2.8GB. Where did the other 2.8GB go, you ask? Well, is was there, allocated by our process, but not being used by our code. In Son Of Strike (SoS -- a command line plug-in to the Windbg debugging tool I hope you never have to use) this showed up as free, empty, unused space! It was just sitting there waiting to be used, but we still ran out of memory. I think I mentioned earlier the memory manager in .NET isn't so smart when it comes to fragmented memory and pinning, well, this is what happens in the worst case.

Good thing for you, the answer to all your memory fragmentation and pinning woes is quite simple. Pre-allocate buffers for use by anything that will be causing pinning, and do it early on before there is a lot of memory thrash (when your application is rapidly allocating and deallocating a lot of memory). We created a simple class called a BufferPool that we use to pre-allocate a certain number of buffers. This pool can grow as need be, but it does so in large chunks and forces a garbage collection each time before the buffers are actually used. This considerably reduces the chances of fragmentation caused by pinned memory. If the pool starts off with 500 buffers, but then the 501st buffer is needed it will grow by a configurable value, typically another 500 buffers, and the induced garbage collection will cause these buffers to shift to the lowest possible point on the heap.

Interestingly enough when we found this bug we already knew about the pinning behavior of socket operations, but had only solved half of it. All of our BeginReceive calls were using the BufferPool because we knew the buffers would remain pinned until we received data from a client, but the BeginSend calls were not using the pool. We had not even considered the fact that sending a few KB of data might take long enough to pin memory, fragment the heap, and cause an OutOfMemoryException. But there is one case where they do, timeouts. The Windows TCP subsystem is very forgiving. If a client loses its connection and the server isn't explicitly told about it, the next piece of data you try to send to that client socket will end up being pinned while the TCP subsystem waits for the client to respond. It can take up to 5 minutes with the default configuration of Windows for the TCP subsystem to figure out the client isn't really there. During that entire time your buffer is pinned in memory. *poof* OutOfMemoryException.

Unfortunately, pre-allocating buffers does not completely fix the issue of running out of memory. There are also some other limits to the size of a .NET process's virtual memory space that are very complicated and I won't talk about, but basically you end up with anywhere from 1/2 to 2/3 usable virtual memory without running the risk of OutOfMemoryException. So, if you have 2GB virtual memory available (standard on a 32bit machine), you end up with about 1.3GB you can actually use reliably. Of course, this varies, and some applications will be able to use more, or maybe less. Your mileage may vary.

Don't fret, all of the issues I've talked about in here have been fixed since SoapBox Server 2005 SR1. And with the most common usage patterns people were not actually affected to begin with.

I hope this was at least marginally interesting to someone. :) Next up, I'll probably talk about limitations we discovered in the Windows Socket infrastructure, or maybe async IO, IOCP, and worker threadpools, or maybe how in the world we actually test at this scale. Only time will tell, unless Chris beats me to it.

Copyright ©2006 JD Conley
Permalink |  Trackback

Comments (17)  
Re: How to Build Scalable .NET Server Applications: Memory Management    By Jaco Erasmus on 7/28/2006 2:40 PM
Thanx for the interesting information, I will definately keep it in mind.

I would definately like to see you or Chris talk about your experiences about worker threadpools.

Re: How to Build Scalable .NET Server Applications: Memory Management    By Greg Young on 8/3/2006 12:46 PM
Is it better to build very large segments then hand out arraysegments or is it better to just initialize many small segments in groups and hand them out directly? In the second case I could also shrink my usage fairly easily if I was no longer at a peak usage.

let me know your thoughts.

gregoryyoung1 at gmail

Re: How to Build Scalable .NET Server Applications: Memory Management    By jconley on 8/3/2006 12:55 PM
As far as pure memory utilization it would be better to use a smaller number of very large buffers and just hand out segments for the socket operation. However, then you have to keep track of which segments are available, rather than just checking the buffer back into your pool after the socket operation is complete.

We chose to use the second method mainly for ease of development, but your idea of shrinking the pool also adds to the credibility of this method

Another idea we had recently was to have pools of varying size buffers. One of the issues now is if an outgoing packet overflows a buffer we grab another, and another, and another, until we have enough buffers to fulfill the request, then queue all those up for delivery. It works great, but would be a lot less complicated if we just passed in one big buffer and let winsock deal with splitting it up as MTU's dictate. It looks like we'll be releasing the source to some of this stuff soon, so you'll get to see what I'm talking about.

We have all of this buffer madness, compression, and TLS encryption all wrapped up into one CompositeNetworkStream.

Re: How to Build Scalable .NET Server Applications: Memory Management    By Greg Young on 8/3/2006 1:32 PM
"However, then you have to keep track of which segments are available, rather than just checking the buffer back into your pool after the socket operation is complete."

In my case the buffers are of a fixed size (1024). So what I do is break up my big chunk (8mb) into segments ahead of time and stick them into a queue. When asked for a buffer I just grab one from the queue ... if the queue gets to be too small I create another big buffer and break it placing items into the queue.

When someone releases a buffer I put it in the queue.

It may seem like alot of overhead but is certainly less than tracking :) and my buffer allocations are quite fast.

Having fixed sized buffers helps alot, I do a similar task to what you describe in the chaining of buffers if needed for larger messages.

Re: How to Build Scalable .NET Server Applications: Memory Management    By jconley on 8/3/2006 1:43 PM
Your algorithm is very similar to ours, except we allocate many fixed size, yet separately referenced buffers at startup time. Or perhaps, I'm just misunderstanding what you're doing and you have the same net effect.

I think having smaller individually referenced buffers would also help .NET to defragment memory, should it need to move something around later in the life of the application.

For example, if you have to allocate another big contiguous chunk of memory, it's likely it's not going to be able to be defragmented as some of it will always be in use and pinned. With many smaller buffers it should be easier for .NET to move things around when pieces of the pool are no longer pinned.

I also wonder about the overhead of passing that large of a buffer into winsock. Since it is one big buffer does the whole 8MB chunk make the transition from .NET to winsock land even though you are only using a 1KB range of it?

Re: How to Build Scalable .NET Server Applications: Memory Management    By Greg Young on 8/3/2006 2:04 PM
"I think having smaller individually referenced buffers would also help .NET to defragment memory, should it need to move something around later in the life of the application."

I think that with the big buffers there is no need to defragment .. they are being put on the LOH which doesn't compress (at this time)


"I also wonder about the overhead of passing that large of a buffer into winsock. Since it is one big buffer does the whole 8MB chunk make the transition from .NET to winsock land even though you are only using a 1KB range of it?"

ArraySegment should take care of that (?), I would imagine that the entire array gets pinned (its pinned at present anyways in the LOH as there is no compression) but the external code is just being told an address and a maximum size.

Re: How to Build Scalable .NET Server Applications: Memory Management    By jconley on 8/3/2006 2:09 PM
"I think that with the big buffers there is no need to defragment .. they are being put on the LOH which doesn't compress (at this time)"

True.


"ArraySegment should take care of that (?)"

Yeah, that was my thought as well. I'm not really sure on the behavior of ArraySegment, but I would guess it would take care of passing that huge buffer around. Though it's not something I'm very familiar with (haven't actually used them in practice yet).

Re: How to Build Scalable .NET Server Applications: Memory Management    By Greg Young on 8/3/2006 2:47 PM
JD: You think it would be worth me releasing my buffer pool for others?


Re: How to Build Scalable .NET Server Applications: Memory Management    By jconley on 8/3/2006 2:47 PM
"You think it would be worth me releasing my buffer pool for others?"

Probably, but we're probably the only two people working on this sort of thing in .NET. :)

Re: How to Build Scalable .NET Server Applications: Memory Management    By Aaron on 8/7/2006 10:12 AM
Not the only two ;) I love writing servers and I'm finding the blog entries here very useful because there is a serious lack of information on writing scalable servers in .NET or otherwise.

You mentioned writing an implementation of IIS in managed code which is one of the first things I did when .NET was released. I scoured everywhere for information on writing servers but the only useful thing I found was Richter's Server Side Application Development. I don't think I've seen anything on memory management. The server worked though and scaled to a couple thousand users (though it only really only needed to scale to one user on a tablet-like PC, but what fun is that?) I didn't do any memory management at all and reading this made me slap my forehead in a duh moment.

I now find myself writing another server and once again am looking for information on creating scalable servers in .Net. I'd love to see your buffer pool implementation Greg. While I get the general jist from the blog and comments, there might be little nuances I'm missing.

I look forward to more posts as this is pretty much the only place I've found on the tips and tricks of writing scalable servers.

Re: How to Build Scalable .NET Server Applications: Memory Management    By Sam on 11/15/2006 9:38 AM
By all means please release the buffer pool for others.

Re: How to Build Scalable .NET Server Applications: Memory Management    By jconley on 11/15/2006 9:40 AM
It looks like it will be released in our upcoming release of SoapBox Studio 2007 (a combination of all of Coversant's SDK's in one installer).

Re: How to Build Scalable .NET Server Applications: Memory Management    By Nevile on 11/16/2006 9:48 AM
Wow, this was an enlightening article. Even if I was the only person to read it, it was worth your time writing it. Many thanks. We are writing a very similar sounding service which needs to sustain upto 120,000 concurrent TCP clients. A few paragraphs from you has probably saved me hundreds of hours of work !

Thankyou, thankyou !

Re: How to Build Scalable .NET Server Applications: Memory Management    By Greg Young on 6/18/2007 3:51 PM
ok guys I finally put it up with an explanation http://codebetter.com/blogs/gregyoung/archive/2007/06/18/async-sockets-and-buffer-management.aspx?CommentPosted=true#commentmessage

Re: How to Build Scalable .NET Server Applications: Memory Management    By Eddie Ray on 8/2/2007 1:12 PM
Thanks for the information. I stumbled across the article while troubleshooting a socket issue (v1.1) where we get the "lacked sufficient buffer space or because a queue was full" error periodically. I am still not sure if pinning is what is causing our failures but it was an interesting read anyway. I have found that if we drop the /3GB switch from boot.ini our issue is drastically reduced, no explanation on this yet. Just weird, we need to get the code up to 2.0 where the issue seems to disappear. Anyway, enough about our issues.. thanks for posting good info and saving all of us TIME.

Re: How to Build Scalable .NET Server Applications: Memory Management    By jconley on 8/2/2007 1:14 PM
usually the lack sufficient buffer space error message is caused by not having enough non-paged kernel memory available. this is definitely something that removing the /3gb switch would help with. of course, then you have less virtual memory available to your application. tradeoffs... :) go 2.0 and 64 bit!

Re: How to Build Scalable .NET Server Applications: Memory Management    By Aron Weiler on 9/20/2007 2:24 PM
I don't know if anyone here has any interest in seeing this, but this is the implementation of a buffer pool that (I believe) works really well.

public static class ChannelBuffer
{
static Queue queue = new Queue();
public static readonly int RecvBufferSize = 1024 * 2 + 512;
static readonly int DefaultBufferCount = 1024;

static int channelBufferCount = DefaultBufferCount;

static ChannelBuffer()
{
channelBufferCount = AppConfigHelper.GetValue("ChannelBufferCount", DefaultBufferCount);

LogMgr.Trace("ChannelBuffer", "ChannelBufferCount={0}", channelBufferCount);

AddBuffers();
}

public static void AddBuffers()
{
int count = channelBufferCount;

while (--count >= 0)
{
queue.Enqueue(new byte[RecvBufferSize]);
}

GC.Collect();
GC.Collect();
}

public static byte[] GetBuffer()
{
lock (queue)
{
if (queue.Count == 0)
{
AddBuffers();
}

return queue.Dequeue();
}
}

public static void ReturnBuffer(ref byte[] buffer)
{
if (buffer == null) return;

lock (queue)
{
queue.Enqueue(buffer);
}

buffer = null;
}

}


©2008 Coversant, Inc. | Privacy Policy | About Coversant | Contact Info