Concatenate Tuples Using Sum()
Solution 1:
the addition operator concatenates tuples in python:
('a', 'b')+('c', 'd')
Out[34]: ('a', 'b', 'c', 'd')
From the docstring of sum
:
Return the sum of a 'start' value (default: 0) plus an iterable of numbers
It means sum
doesn't start with the first element of your iterable, but rather with an initial value that is passed through start=
argument.
By default sum
is used with numeric thus the default start value is 0
. So summing an iterable of tuples requires to start with an empty tuple. ()
is an empty tuple:
type(())
Out[36]: tuple
Therefore the working concatenation.
As per performance, here is a comparison:
%timeit sum(tuples, ())
The slowest run took 9.40 times longer than the fastest. This could mean that an intermediate resultis being cached.
1000000 loops, best of3: 285 ns per loop
%timeit tuple(it.chain.from_iterable(tuples))
The slowest run took 5.00 times longer than the fastest. This could mean that an intermediate resultis being cached.
1000000 loops, best of3: 625 ns per loop
Now with t2 of a size 10000:
%timeit sum(t2, ())
10 loops, best of 3: 188 ms per loop
%timeit tuple(it.chain.from_iterable(t2))
1000 loops, best of 3: 526 µs per loop
So if your list of tuples is small, you don't bother. If it's medium size or larger, you should use itertools
.
Solution 2:
It works because addition is overloaded (on tuples) to return the concatenated tuple:
>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!')
('hello', 'these', 'are', 'my', 'tuples!')
That's basically what sum
is doing, you give an initial value of an empty tuple and then add the tuples to that.
However this is generally a bad idea because addition of tuples creates a new tuple, so you create several intermediate tuples just to copy them into the concatenated tuple:
()
('hello',)
('hello', 'these', 'are')
('hello', 'these', 'are', 'my', 'tuples!')
That's an implementation that has quadratic runtime behavior. That quadratic runtime behavior can be avoided by avoiding the intermediate tuples.
>>>tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))
Using nested generator expressions:
>>> tuple(tuple_item for tup in tuples for tuple_item in tup)
('hello', 'these', 'are', 'my', 'tuples!')
Or using a generator function:
defflatten(it):
for seq in it:
for item in seq:
yield item
>>> tuple(flatten(tuples))
('hello', 'these', 'are', 'my', 'tuples!')
Or using itertools.chain.from_iterable
:
>>> import itertools
>>> tuple(itertools.chain.from_iterable(tuples))
('hello', 'these', 'are', 'my', 'tuples!')
And if you're interested how these perform (using my simple_benchmark
package):
import itertools
import simple_benchmark
defflatten(it):
for seq in it:
for item in seq:
yield item
defsum_approach(tuples):
returnsum(tuples, ())
defgenerator_expression_approach(tuples):
returntuple(tuple_item for tup in tuples for tuple_item in tup)
defgenerator_function_approach(tuples):
returntuple(flatten(tuples))
defitertools_approach(tuples):
returntuple(itertools.chain.from_iterable(tuples))
funcs = [sum_approach, generator_expression_approach, generator_function_approach, itertools_approach]
arguments = {(2**i): tuple((1,) for i inrange(1, 2**i)) for i inrange(1, 13)}
b = simple_benchmark.benchmark(funcs, arguments, argument_name='number of tuples to concatenate')
b.plot()
(Python 3.7.2 64bit, Windows 10 64bit)
So while the sum
approach is very fast if you concatenate only a few tuples it will be really slow if you try to concatenate lots of tuples. The fastest of the tested approaches for many tuples is itertools.chain.from_iterable
Solution 3:
That's clever and I had to laugh because help expressly forbids strings, which are also immutable, but it works
sum(...)
sum(iterable[, start]) ->valueReturn the sum of an iterable of numbers (NOT strings) plus the valueofparameter'start' (which defaults to0). When the iterable isempty, return start.
You can add tuples to get a new, bigger tuple. And since you gave a tuple as a start value, the addition works.
Solution 4:
Just to complement the accepted answer with some more benchmarks:
import functools, operator, itertools
import numpy as np
N = 10000
M = 2
ll = tuple(tuple(x) for x in np.random.random((N, M)).tolist())
%timeit functools.reduce(operator.add, ll)
# 407 ms ± 5.63 ms per loop (mean ± std. dev. of7 runs, 1loopeach)
%timeit functools.reduce(lambda x, y: x + y, ll)
# 425 ms ± 7.16 ms per loop (mean ± std. dev. of7 runs, 1loopeach)
%timeit sum(ll, ())
# 426 ms ± 14.3 ms per loop (mean ± std. dev. of7 runs, 1loopeach)
%timeit tuple(itertools.chain(*ll))
# 601 µs ± 5.43 µs per loop (mean ± std. dev. of7 runs, 1000 loops each)
%timeit tuple(itertools.chain.from_iterable(ll))
# 546 µs ± 25.1 µs per loop (mean ± std. dev. of7 runs, 1000 loops each)
EDIT: the code is updated to actually use tuples. And, as per comments, the last two options are now inside a tuple()
constructors, and all the times have been updated (for consistency). The itertools.chain*
options are still the fastest but now the margin is reduced.
Solution 5:
The second argument start
, where you put ()
, is the starting object to add to, it's 0
in default for number addition.
Here is a sample implementation of sum
(what I expect):
def sum(iterable, /, start=0):
for element in iterable:
start+= element
returnstart
Example:
>>> sum([1, 2, 3])
6>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))
>>> sum(tuples)
TypeError: unsupported operand type(s) for +=: 'int'and'tuple'>>> sum(tuples, ())
('hello', 'these', 'are', 'my', 'tuples!')
>>>
It works since tuple concatenation with +
is supported.
Virtually this gets translated to:
>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!')
('hello', 'these', 'are', 'my', 'tuples!')
>>>
Post a Comment for "Concatenate Tuples Using Sum()"