Skip to content Skip to sidebar Skip to footer

Is There A Better Way To Do An "unravel" Function In Python?

I was faced with the problem of executing n number of concurrent events that all return iterators to the results they aquired. However, there was an optional limit parameter that s

Solution 1:

What you're doing here is almost just zip.

You want a flat iterable, rather than an iterable of sub-iterables, but chain fixes that.

And you want to take only the first N values, but islice fixes that.

So, if the lengths are all equal:

>>> list(chain.from_iterable(zip(a, b)))
[1, 6, 2, 7, 3, 8, 4, 9, 5, 10]
>>> list(islice(chain.from_iterable(zip(a, b)), 7))
[1, 6, 2, 7, 3, 8, 4]

But if the lengths aren't equal, that will stop as soon as the first iterable finishes, which you don't want. And the only alternative in the stdlib is zip_longest, which fills in missing values with None.

You can pretty easily write a zip_longest_skipping (which is effectively the round_robin in Peter's answer), but you can also just zip_longest and filter out the results:

>>> list(filter(None, chain.from_iterable(zip_longest(a, b, c, d))))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]

(Obviously this doesn't work as well if your values are all either strings or None, but when they're all positive integers it works fine… to handle the "or None" case, do sentinel=object(), pass that to zip_longest, then filter on x is not sentinel.)

Solution 2:

From the itertools example recipes:

defroundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"# Recipe credited to George Sakkis
    pending = len(iterables)
    nexts = cycle(iter(it).__next__ for it in iterables)
    while pending:
        try:
            fornextin nexts:
                yieldnext()
        except StopIteration:
            pending -= 1
            nexts = cycle(islice(nexts, pending))

Use itertools.islice to enforce your with_limit, eg:

print([e for e in itertools.islice(roundrobin(c, d), 3)])

>>> list(roundrobin(a, b, c, d))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]

Solution 3:

For what you're actually trying to do, there's probably a much better solution.

I execute 2,000 url requests on 8 threads but just want the first 100 results, but not all 100 from the same potential thread.

OK, so why are the results in 8 separate iterables? There's no good reason for that. Instead of giving each thread its own queue (or global list and lock, or whatever you're using) and then trying to zip them together, why not have them all share a queue in the first place?

In fact, that's the default way that almost any thread pool is designed (including multiprocessing.Pool and concurrent.futures.Executor in the stdlib). Look at the main example for concurrent.futures.ThreadPoolExecutor:

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

That's almost exactly your use case—spamming a bunch of URL downloads out over 5 different threads and gathering the results as they come in—without your problem even arising.

Of course it's missing with_limit, but you can just wrap that as_completed iterable in islice to handle that, and you're done.

Solution 4:

This uses a generator and izip_longest to pull one item at a time from multiple iterators

from itertools import izip_longest


defunravel(cap, *iters):

    counter = 0forslicein izip_longest(*iters):
        for entry in [s for s insliceif s isnotNone]:
            yield entry
            counter += 1if counter >= cap: break

Post a Comment for "Is There A Better Way To Do An "unravel" Function In Python?"