Is There A Better Way To Do An "unravel" Function In Python?
Solution 1:
What you're doing here is almost just zip
.
You want a flat iterable, rather than an iterable of sub-iterables, but chain
fixes that.
And you want to take only the first N values, but islice
fixes that.
So, if the lengths are all equal:
>>> list(chain.from_iterable(zip(a, b)))
[1, 6, 2, 7, 3, 8, 4, 9, 5, 10]
>>> list(islice(chain.from_iterable(zip(a, b)), 7))
[1, 6, 2, 7, 3, 8, 4]
But if the lengths aren't equal, that will stop as soon as the first iterable finishes, which you don't want. And the only alternative in the stdlib is zip_longest
, which fills in missing values with None
.
You can pretty easily write a zip_longest_skipping
(which is effectively the round_robin
in Peter's answer), but you can also just zip_longest
and filter out the results:
>>> list(filter(None, chain.from_iterable(zip_longest(a, b, c, d))))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]
(Obviously this doesn't work as well if your values are all either strings or None
, but when they're all positive integers it works fine… to handle the "or None
" case, do sentinel=object()
, pass that to zip_longest
, then filter on x is not sentinel
.)
Solution 2:
From the itertools
example recipes:
defroundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while pending:
try:
fornextin nexts:
yieldnext()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
Use itertools.islice
to enforce your with_limit
, eg:
print([e for e in itertools.islice(roundrobin(c, d), 3)])
>>> list(roundrobin(a, b, c, d))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]
Solution 3:
For what you're actually trying to do, there's probably a much better solution.
I execute 2,000 url requests on 8 threads but just want the first 100 results, but not all 100 from the same potential thread.
OK, so why are the results in 8 separate iterables? There's no good reason for that. Instead of giving each thread its own queue (or global list and lock, or whatever you're using) and then trying to zip them together, why not have them all share a queue in the first place?
In fact, that's the default way that almost any thread pool is designed (including multiprocessing.Pool
and concurrent.futures.Executor
in the stdlib). Look at the main example for concurrent.futures.ThreadPoolExecutor
:
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
That's almost exactly your use case—spamming a bunch of URL downloads out over 5 different threads and gathering the results as they come in—without your problem even arising.
Of course it's missing with_limit
, but you can just wrap that as_completed
iterable in islice
to handle that, and you're done.
Solution 4:
This uses a generator and izip_longest to pull one item at a time from multiple iterators
from itertools import izip_longest
defunravel(cap, *iters):
counter = 0forslicein izip_longest(*iters):
for entry in [s for s insliceif s isnotNone]:
yield entry
counter += 1if counter >= cap: break
Post a Comment for "Is There A Better Way To Do An "unravel" Function In Python?"