Char Memory Allocation in C
November 8, 2006 12:19 PM
A question about memory allocation in C. Yes, I have to use C. There's
So I'm writing a program to build some trees, and the relevant part of my code looks pretty much like this:
The problem is, sizeof(node) returns 16 bytes = 128 bits!! So I'm allocating more than I need here! And if I manually use malloc() to allocate 13 bytes, I get overflow errors.
Also interestingly, I'm able to stick 3 more chars in there, without increasing the node size. This suggests to me that my compiler is allocating memory in 32-bit chunks. Can't I get it to allocate 1 byte at a time? This is GCC on 32-bit linux, FWIW.
So I'm writing a program to build some trees, and the relevant part of my code looks pretty much like this:
struct _nodeNow, this is a very memory-intense operation, so I want these structs to be as small as possible. (Yes, it really does matter) When I do the math on memory size, I see that I should need (32 bits*3) + 8bits = 104 bits = 13 bytes.
{
struct node *parent;
char flags;
struct _node *l;
struct _node *r;;
};
typedef struct _node node;
The problem is, sizeof(node) returns 16 bytes = 128 bits!! So I'm allocating more than I need here! And if I manually use malloc() to allocate 13 bytes, I get overflow errors.
Also interestingly, I'm able to stick 3 more chars in there, without increasing the node size. This suggests to me that my compiler is allocating memory in 32-bit chunks. Can't I get it to allocate 1 byte at a time? This is GCC on 32-bit linux, FWIW.
Yeah, check out this wikipedia page, particularly the implementation section. sizeof in general is going to align on word boundaries and pad the struct to meet the next boundary.
A refactoring might do you some good, by not using structs. Makes the code much harder to use/understand, but it could save you memory space. Imagine that instead of a struct, you make a series of arrays for each element in your current struct (one array for parent, one array for l, for r, for flags). You could start off with a small array (or pre-alloc for expected initial size) and add to each array as necessary. This way I think you could get away with using the minimal memory possible.
posted by RustyBrooks at 12:30 PM on November 8, 2006
A refactoring might do you some good, by not using structs. Makes the code much harder to use/understand, but it could save you memory space. Imagine that instead of a struct, you make a series of arrays for each element in your current struct (one array for parent, one array for l, for r, for flags). You could start off with a small array (or pre-alloc for expected initial size) and add to each array as necessary. This way I think you could get away with using the minimal memory possible.
posted by RustyBrooks at 12:30 PM on November 8, 2006
The packing answer was what I needed. Thanks!
If I need to optimize even further, I'll look into what you advised RustyBrooks.
I'll just be happy when this little project is over and I can go back to writing in civilized languages that don't make me manage my own memory. *grumble grumble*
posted by chrisamiller at 12:33 PM on November 8, 2006
If I need to optimize even further, I'll look into what you advised RustyBrooks.
I'll just be happy when this little project is over and I can go back to writing in civilized languages that don't make me manage my own memory. *grumble grumble*
posted by chrisamiller at 12:33 PM on November 8, 2006
You may run into problems if you just turn on packing -- modern processors really like to load word-sized data from word-aligned addresses, and can take a significant performance hit (or fail completely) on unaligned data. I agree that an alternative data structure might suit your needs better. (Or you could try to stash your flags in some low-order bits of the pointers, which should always be aligned and thus zero -- but that's really a hack.)
posted by xil at 12:33 PM on November 8, 2006
posted by xil at 12:33 PM on November 8, 2006
Your structs will pretty much always be word aligned in C. It's not going to run any slower, because splitting up a word (4 bytes) between two different variables would introduce tremendous time overhead.
This is pretty basic C knowledge, and expected behaviour; not a problem with your compiler or environment. You can read more at: Wikipedia: data structure alignment.
posted by ceribus peribus at 12:35 PM on November 8, 2006
This is pretty basic C knowledge, and expected behaviour; not a problem with your compiler or environment. You can read more at: Wikipedia: data structure alignment.
posted by ceribus peribus at 12:35 PM on November 8, 2006
If you're making enough of these structures that 3 bytes matters you should allocate them a bunch at a time and manage your own pool of them to avoid the significant malloc overhead. The memory manager is probably adding a dozen bytes or more to the block for bookkeeping and stuff.
posted by aubilenon at 12:37 PM on November 8, 2006
posted by aubilenon at 12:37 PM on November 8, 2006
aubilenon, do you happen to have a link or two that explains the basics of doing what you described?
I'm not a programming newbie, I'm just retarded when it comes to C and memory.
posted by chrisamiller at 1:04 PM on November 8, 2006
I'm not a programming newbie, I'm just retarded when it comes to C and memory.
posted by chrisamiller at 1:04 PM on November 8, 2006
Any chance you could get rid of the pointer to the parent of the current node? Usually when walking around trees you only need pointers going down, and you can use the stack (if you are programming recursively) to keep track of where you have been.
posted by Maxwell_Smart at 1:16 PM on November 8, 2006
posted by Maxwell_Smart at 1:16 PM on November 8, 2006
Yeah - removing the parent is a step that I'll likely take soon. For simplicity, I'm leaving it temporarily. Thanks.
posted by chrisamiller at 1:38 PM on November 8, 2006
posted by chrisamiller at 1:38 PM on November 8, 2006
aubilenon is correct. Almost all malloc implementations use at least four bytes of bookkeeping overhead to every allocation via a call to malloc. In addition, many implementations also align allocations to 16-byte boundaries, to improve memory access performance.
If you need to use memory efficiency allocate a single large block of memory (or perhaps a small number of intermediate sized ones, if you can't predict upfront how many nodes you'll be allocating). To allocate a node, use pointer arithmetic to increment a "free node pointer" and use this value instead of calling malloc(). Don't call free() on the pointer to these nodes, just discard them. Make sure you don't overrun the end of your block. This procedure will waste memory if you frequently destroy nodes, but will be efficient if you never or only rarely do so.
posted by RichardP at 2:13 PM on November 8, 2006
If you need to use memory efficiency allocate a single large block of memory (or perhaps a small number of intermediate sized ones, if you can't predict upfront how many nodes you'll be allocating). To allocate a node, use pointer arithmetic to increment a "free node pointer" and use this value instead of calling malloc(). Don't call free() on the pointer to these nodes, just discard them. Make sure you don't overrun the end of your block. This procedure will waste memory if you frequently destroy nodes, but will be efficient if you never or only rarely do so.
posted by RichardP at 2:13 PM on November 8, 2006
I don't know of any sites that talk about this stuff in C, so I'll just suggest a simple and fairly efficient implementation:
You get and release your nodes always through your own AllocateNode and FreeNode functions.
you have a bunch of fixed size pools of objects. Let's call it 256 objects per pool; that's big enough that malloc overhead is not so big a problem.
Something like:
Initialize free_node_queue to contain {0,1,2,...,255} It's a circular queue that tells you the index of which nodes are not allocated. If you want to make it more complicated and slower to allocate, but more memory efficient, you can keep a bitfield.
So keep a list of puddles (if you sort them by address you can free a little quicker, but I don't care). Each one has a circular queue of free nodes. If you free one you add it to the end of the queue, and if you allocate one you just hand out the first one in the queue, and pop it off the front of the queue. If there's no node free, move on to the next puddle. If there's no puddle with room, make a new one.
To free figure out, by address, which puddle the node belongs in, and just add it to the free_node_queue.
posted by aubilenon at 2:17 PM on November 8, 2006
You get and release your nodes always through your own AllocateNode and FreeNode functions.
you have a bunch of fixed size pools of objects. Let's call it 256 objects per pool; that's big enough that malloc overhead is not so big a problem.
Something like:
struct node_puddle {
node nodes[256];
char free_node_queue[256];
int queue_start, queue_end;
};
Initialize free_node_queue to contain {0,1,2,...,255} It's a circular queue that tells you the index of which nodes are not allocated. If you want to make it more complicated and slower to allocate, but more memory efficient, you can keep a bitfield.
So keep a list of puddles (if you sort them by address you can free a little quicker, but I don't care). Each one has a circular queue of free nodes. If you free one you add it to the end of the queue, and if you allocate one you just hand out the first one in the queue, and pop it off the front of the queue. If there's no node free, move on to the next puddle. If there's no puddle with room, make a new one.
To free figure out, by address, which puddle the node belongs in, and just add it to the free_node_queue.
posted by aubilenon at 2:17 PM on November 8, 2006
Wow, thanks for the detailed explanation. That's exactly the kind of thing I need. I'm doing bioinformatics work, and I'll be allocating tens of millions of these nodes. I've been banging my head against the wall trying to figure out ways to reduce the memory overhead, and this pooling idea looks like it's just what I need.
Thanks a million, guys!!
posted by chrisamiller at 2:46 PM on November 8, 2006
Thanks a million, guys!!
posted by chrisamiller at 2:46 PM on November 8, 2006
Have you tried making this small change?
posted by Araucaria at 2:49 PM on November 8, 2006
struct _node { struct node *parent; struct _node *l; struct _node *r;; char flags; }; typedef struct _node node;Even if you pack your structures, you could be slowing things down if your pointers aren't aligned on 4 or 8 byte boundaries.
posted by Araucaria at 2:49 PM on November 8, 2006
You can save a further 4 bytes per structure by using an XOR Linked List
posted by gadha at 2:56 PM on November 8, 2006
posted by gadha at 2:56 PM on November 8, 2006
Even if you pack your structures, you could be slowing things down if your pointers aren't aligned on 4 or 8 byte boundaries.
I think you're supposed to get aligned stuff back from malloc no matter what, but that might mean you're getting memory allocated in non-contiguous chunks which could again mean wasted memory.
I wonder if you can reduce some of the non-contiguous space and bookkeeping overhead by using calloc.
disclaimer: IANA C programmer. Even though I worked on real software projects in it for three years, we never got along well.
posted by weston at 3:52 PM on November 8, 2006
I think you're supposed to get aligned stuff back from malloc no matter what, but that might mean you're getting memory allocated in non-contiguous chunks which could again mean wasted memory.
I wonder if you can reduce some of the non-contiguous space and bookkeeping overhead by using calloc.
disclaimer: IANA C programmer. Even though I worked on real software projects in it for three years, we never got along well.
posted by weston at 3:52 PM on November 8, 2006
I feel bound to point out that C is exactly the right language for this, because anything more civilized is less likely to give you enough control to minimize your memory consumption.
Tens of millions of nodes, where your current design gives each node 12 bytes of pointers for one byte of payload, says to me that you should be listening to b1trot.
You can represent any balanced binary tree (not just a heap, which is a special sorted kind) in a flat array, with no pointers involved: the child nodes of any node [i] are always at [i * 2] and [i * 2 + 1]; the parent node of any node [j] is always at [j / 2]; the root node is at [1]. You'll need an extra flags bit to distinguish used nodes from unused ones. In your case, with a 1-byte payload, it can be a simple array of char. Use malloc() to allocate 50 megabytes of space to start with; use realloc() to grow it to twice its present size every time you run out.
Of course, if your tree isn't approximately balanced, this fails nastily.
posted by flabdablet at 5:04 PM on November 8, 2006
Tens of millions of nodes, where your current design gives each node 12 bytes of pointers for one byte of payload, says to me that you should be listening to b1trot.
You can represent any balanced binary tree (not just a heap, which is a special sorted kind) in a flat array, with no pointers involved: the child nodes of any node [i] are always at [i * 2] and [i * 2 + 1]; the parent node of any node [j] is always at [j / 2]; the root node is at [1]. You'll need an extra flags bit to distinguish used nodes from unused ones. In your case, with a 1-byte payload, it can be a simple array of char. Use malloc() to allocate 50 megabytes of space to start with; use realloc() to grow it to twice its present size every time you run out.
Of course, if your tree isn't approximately balanced, this fails nastily.
posted by flabdablet at 5:04 PM on November 8, 2006
This thread is closed to new comments.
posted by sbutler at 12:27 PM on November 8, 2006