Issue33214
Created on 2018-04-03 14:33 by Javier Dehesa, last changed 2022-04-11 14:58 by admin.
| Messages (9) | |||
|---|---|---|---|
| msg314881 - (view) | Author: Javier Dehesa (Javier Dehesa) | Date: 2018-04-03 14:33 | |
It is pretty trivial to concatenate a sequence of strings:
''.join([str1, str2, ...])
Concatenating a sequence of lists is for some reason significantly more convoluted. Some current options include:
sum([lst1, lst2, ...], [])
[x for y [lst1, lst2, ...] for x in y]
list(itertools.chain(lst1, lst2, ...))
The first one being the less recomendable but more intuitive and the third one being the faster but most cumbersome (see https://stackoverflow.com/questions/49631326/why-is-itertools-chain-faster-than-a-flattening-list-comprehension ). None of these looks like "the one obvious way to do it" to me. Furthermore, I feel a dedicated concatenation method could be more efficient than any of these approaches.
If we accept that ''.join(...) is an intuitive idiom, why not provide the syntax:
[].join([lst1, lst2, ...])
And while we are at it:
().join([tpl1, tpl2, ...])
Like with str, these methods should only accept sequences of objects of their own class (e.g. we could do [].join(list(s) for s in seqs) if seqs contains lists, tuples and generators). The use case for non-empty joiners would probably be less frequent than for strings, but it also solves a problem that has no clean solution with the current tools. Here is what I would probably do to join a sequence of lists with [None, 'STOP', None]:
lsts = [lst1, lst2, ...]
joiner = [None, 'STOP', None]
lsts_joined = list(itertools.chain.from_iterable(lst + joiner for lst in lsts))[:-len(joiner)]
Which is awful and inefficient (I am not saying this is the best or only possible way to solve it, it is just what I, self-considered experienced Python developer, might write).
|
|||
| msg314882 - (view) | Author: Christian Heimes (christian.heimes) * ![]() |
Date: 2018-04-03 14:40 | |
join() is a bad choice, because new developers will confusing list.join with str.join. We could turn list.extend(iterable) into list.extend(*iterable). Or you could just use extend with a chain iterator: >>> l = [] >>> l.extend(itertools.chain([1], [2], [3])) >>> l [1, 2, 3] |
|||
| msg314883 - (view) | Author: Javier Dehesa (Javier Dehesa) | Date: 2018-04-03 15:06 | |
Thanks Christian. I thought of join precisely because it performs conceptually the same function as with str, so the parallel between ''.join(), [].join() and ().join() looked more obvious. Also there is os.path.join and PurePath.joinpath, so the verb seemed well-established. As for shared method names, index and count are present both in sequences and str - although it is true that these do return the same kind of object in any cases. I'm not saying your points aren't valid, though. Your proposed way with extend is I guess about the same as list(itertools.chain(...)), which could be considered to be enough. I just feel that is not particularly convenient, especially for newer developers, which will probably gravitate towards sum(...) more than itertools or a nested generator expression, but I may be wrong. |
|||
| msg314885 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2018-04-03 15:23 | |
String concatenation: f'{a}{b}{c}'
List concatenation: [*a, *b, *c]
Tuple concatenation: (*a, *b, *c)
Set union: {*a, *b, *c}
Dict merging: {**a, **b, **c}
|
|||
| msg352387 - (view) | Author: Josh Rosenberg (josh.r) * ![]() |
Date: 2019-09-13 18:35 | |
Note that all of Serhiy's examples are for a known, fixed number of things to concatenate/union/merge. str.join's API can be used for that by wrapping the arguments in an anonymous tuple/list, but it's more naturally for a variable number of things, and the unpacking generalizations haven't reached the point where:
[*seq for seq in allsequences]
is allowed.
list(itertools.chain.from_iterable(allsequences))
handles that just fine, but I could definitely see it being convenient to be able to do:
[].join(allsequences)
That said, a big reason str provides .join is because it's not uncommon to want to join strings with a repeated separator, e.g.:
# For not-really-csv-but-people-do-it-anyway
','.join(row_strings)
# Separate words with spaces
' '.join(words)
# Separate lines with newlines
'\n'.join(lines)
I'm not seeing even one motivating use case for list.join/tuple.join that would actually join on a non-empty list or tuple ([None, 'STOP', None] being rather contrived). If that's not needed, it might make more sense to do this with an alternate constructor (a classmethod), e.g.:
list.concat(allsequences)
which would avoid the cost of creating an otherwise unused empty list (the empty tuple is a singleton, so no cost is avoided there). It would also work equally well with both tuple and list (where making list.extend take varargs wouldn't help tuple, though it's a perfectly worthy idea on its own).
Personally, I don't find using itertools.chain (or its from_iterable alternate constructor) all that problematic (though I almost always import it with from itertools import chain to reduce the verbosity, especially when using chain.from_iterable). I think promoting itertools more is a good idea; right now, the notes on concatenation for sequence types mention str.join, bytes.join, and replacing tuple concatenation with a list that you call extend on, but doesn't mention itertools.chain at all, which seems like a failure to make the best solution the discoverable/obvious solution.
|
|||
| msg352530 - (view) | Author: Александр Семенов (iamsav) | Date: 2019-09-16 09:33 | |
in javascript join() is made the other way around
['1','2','3'].join(', ')
so, [].join() may confuse some peoples.
|
|||
| msg352531 - (view) | Author: Christian Heimes (christian.heimes) * ![]() |
Date: 2019-09-16 09:46 | |
> in javascript join() is made the other way around
> ['1','2','3'].join(', ')
> so, [].join() may confuse some peoples.
It would be too confusing to have two different approaches to join strings in Python. Besides ECMAScript 1 came out in 1997, 5 years after Python was first released. By that argument JavaScript that should.
|
|||
| msg352532 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2019-09-16 09:53 | |
How common is the case of variable number of things to concatenate/union/merge?
From my experience, in most ceases this looks like:
result = []
for ...:
# many complex statements
# may include continue and break
result.extend(items) # may be intermixed with result.append(item)
So concatenating purely lists from some sequence is very special case. And there are several ways to perform it.
result = []
for items in seq:
result.extend(items)
# nothing wrong with this simple code, really
result = [x for items in seq for x in items]
# may be less effective for really long sublists,
# but looks simple
result = list(itertools.chain.from_iterable(items))
# if you are itertools addictive ;-)
|
|||
| msg352534 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2019-09-16 10:04 | |
It is history, but in 1997 Python had the same order of arguments as ECMAScript: string.join(words [, sep]). str.join() was added only in 1999 (226ae6ca122f814dabdc40178c7b9656caf729c2). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:59 | admin | set | github: 77395 |
| 2019-09-16 10:04:49 | serhiy.storchaka | set | messages: + msg352534 |
| 2019-09-16 09:53:43 | serhiy.storchaka | set | messages: + msg352532 |
| 2019-09-16 09:46:11 | christian.heimes | set | messages: + msg352531 |
| 2019-09-16 09:33:29 | iamsav | set | nosy:
+ iamsav messages: + msg352530 |
| 2019-09-13 18:35:07 | josh.r | set | nosy:
+ josh.r messages: + msg352387 |
| 2018-04-06 16:29:01 | eric.araujo | set | nosy:
+ eric.araujo |
| 2018-04-03 15:23:56 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg314885 |
| 2018-04-03 15:06:33 | Javier Dehesa | set | messages: + msg314883 |
| 2018-04-03 14:40:42 | christian.heimes | set | nosy:
+ christian.heimes messages:
+ msg314882 |
| 2018-04-03 14:33:53 | Javier Dehesa | create | |

