Skip to content Skip to sidebar Skip to footer

Understanding Memory Usage In Python

I'm trying to understand how python is using memory to estimate how many processes I can run at a time. Right now I process large files on a server with large amounts of ram (~90-

Solution 1:

I'm going to suggest that you back off and approach this instead in a way that directly addresses your goal: shrinking peak memory use to begin with. No amount of analysis & fiddling later can overcome using a doomed approach to begin with ;-)

Concretely, you got off on a wrong foot at the first step, via data=f.read(). Now it's already the case that your program can't possibly scale beyond a data file that fits entirely in RAM with room to spare (to run the OS and Python and ...) too.

Do you actually need all the data to be in RAM at one time? There are too few details to tell about later steps, but obviously not at the start, since you immediately want to throw away 75% of the lines you read.

So start off by doing that incrementally instead:

defgetlines4(f):
    for i, line inenumerate(f):
        if i % 4 == 1:
            yield line

Even if you do nothing other than just that much, you can skip directly to the result of step 3, saving an enormous amount of peak RAM use:

withopen(file, 'r') as f:
    data = list(getlines4(f))

Now peak RAM need is proportional to the number of bytes in the only lines you care about, instead of to the total number of file bytes period.

To continue making progress, instead of materializing all the lines of interest in data in one giant gulp, feed the lines (or chunks of lines) incrementally to your worker processes too. There wasn't enough detail for me to suggest concrete code for that, but keep the goal in mind and you'll figure it out: you only need enough RAM to keep incrementally feeding lines to worker processes, and to save away however much of the worker processes' results you need to keep in RAM. It's possible that peak memory use doesn't need to more than "tiny", regardless of input file size.

Fighting memory management details instead is enormously harder than taking a memory-friendly approach to begin with. Python itself has several memory-management subsystems, and a great deal can be said about each of them. They in turn rely on the platform C malloc/free facilities, about which there's also a great deal to learn. And we're still not at a level that has anything directly to do with what your operating system reports for "memory use". The platform C libraries in turn rely on platform-specific OS memory managment primitives, which - typically - only OS kernel memory experts truly understand.

The answer to "why does the OS say I'm still using N GiB of RAM?" can rely on application-specific details in any one of those layers, or even on unfortunate more-or-less accidental interactions among them. Far better to arrange not to need to ask such questions to begin with.

EDIT - about CPython's obmalloc

It's great that you gave some runnable code, but not so great that nobody but you can run it since nobody else has your data ;-) Things like "how many lines are there?" and "what's the distribution of line lengths?" can be critical, but we have no way to guess.

As I noted before, application-specific details are often necessary to out-think modern memory managers. They're complex, and behavior at all the levels can be subtle.

Python's primary object allocator ("obmalloc") requests "arenas" from the platform C malloc, chunks of 2**18 bytes. So long as that's the Python memory system your application is using (which can't be guessed at because we don't have your data to work with), 256 KiB is the smallest granularity at which memory is requested from, or returned to, the C level. The C level in turn typically has "chunk things up" strategies of its own, which vary across C implementations.

A Python arena is in turn carved into 4 KiB "pools", each of which dynamically adapts to be carved into smaller chunks of a fixed size per pool (8-byte chunks, 16-bytes chunks, 24-byte chunks, ..., 8*i-byte chunks per pool).

So long as a single byte in an arena is being used for live data, the entire arena must be retained. If that means the other 262,143 arena bytes sit unused, tough luck. As your output shows, all the memory is returned in the end, so why do you really care? I understand it's an abstractly interesting puzzle, but you're not going to solve it short of making major efforts to understand the code in CPython's obmalloc.c. For a start. Any "summary" would leave out a detail that's actually important to some application's microscopic behavior.

Plausible: your strings are short enough that space for all the string object headers and contents (the actual string data) are obtained from CPython's obmalloc. They're going to be splattered all over multiple arenas. An arena might look like this, where "H" represents pools from which string object headers are allocated, and "D" pools from which space for string data is allocated:

HHDDHHDDHHDDHHDDHHDDHHDDHHDDHHDDHHDDHHDDHHDDHHDDHHDDHHDD...

In your method1 they'll tend to alternate "like that" because creating a single string object requires allocating space separately for the string object header and the string object data. When you go on to throw out 3/4ths of the strings you created, more-or-less 3/4ths of that space becomes reusable to Python. But not one byte can be returned to the system C because there's still live data sprayed all over the arena, containing the quarter of the string objects you didn't throw away (here "-" means space available for reuse):

HHDD------------HHDD------------HHDD------------HHDD----...

There's so much free space that, in fact, it's possible that the less wasteful method2 can get all the memory it needs from the -------- holes left over from method1 even when you don't throw away the method1 result.

Just to keep things simple ;-) , I'll note that some of those details about how CPython's obmalloc gets used vary across Python releases too. In general, the more recent the Python release, the more it tries to use obmalloc first instead of the platform C malloc/free (because obmalloc is generally faster).

But even if you use the platform C malloc/free directly, you can still see the same kinds of things happening. Kernel memory system calls are typically more expensive than running code purely in user space, so platform C malloc/free routines typically have their own strategies for "ask the kernel for much more memory than we need for a single request, and carve it up into smaller pieces ourself".

Something to note: neither Python's obmalloc nor platorm C malloc/free implementations ever move live data on their own. Both return memory addresses to clients, and those cannot change. "Holes" are an inescapable fact of life under both.

Post a Comment for "Understanding Memory Usage In Python"