Python string improvements (performance and somewhat readability)
While CPython does some magic to avoid quadratic complexity of repeated 'addition' to string-the-immutable-object achieving linear complexity for stuff like:
spam = ''
for i in range(1000):
spam += 'eat spam!\n'
This is by no means guaranteed as Python docs point out:
and we shouldn't depend on such a behaviour especially in stuff like
dump_status()
of nmci/ctx.py
that join more than a handful of original
strings and do it for each and every scenario. Actually, based on a crude benchmark, the example above is about twice
as slow as the more pythonic example below (yeah, such a simple thing is even better done by list comprehension):
spam = []
for i in range(1000)
spam.append('eat spam!')
spam = '\n'.join(spam)
So the main goal of this commit is to switch all instances of strings
being built incrementally to building a list (or where suitable, set)
instead and .join()-ing it afterwards and to change '{foo}{bar}'.format(foo=foo, bar=bar)
string to f'{foo}{bar}'
f-strings
I started this as a more thorough effort on removing string addition using + or += operators altogether but after creating a simple benchmark I realized that this, .join() and f-string have about the same performance with '%s%s' % (var1, var2) and '{}{}'.format(var1, var2) being only slightly slower. The only slower outlier is parametrized '{foo}{bar}'.format(foo=foo, bar=bar) so I f-string-ified those in following commits. I however left some changes done before benchmarking that IMO increase code readability.
example benchmark results on my system are:
$ ./python-string-related-perf.py
join fixed number of strings (4) to a long string (number=2000000):
'command %s --parameter %s' % (strings[0], strings[1]) : 0.6372304310025356
'command {} --parameter {}'.format(strings[0], strings[1]) : 0.6514693839999381
'command {arg} --parameter {param}'.format(arg=strings[0], param=strings[1]) : 1.0831694859989511
'command ' + str[0] + ' --parameter=' + str[1] : 0.3938722479979333
' '.join(['command', str[0], '--parameter=', str[1]]) : 0.40208850300041377
f'command {str[0]} --parameter {str[1]}' : 0.4099529650011391
build string from 50 to 80-long preexisting parts (number=5000):
build long string from 10 parts:
'\n'.join(strings) : 0.0017692799992801156
for i in strings: msgs.append(i); '\n'.join(msgs) : 0.003394540999579476
for i in strings: msg += msg + i + '\n' : 0.0055492060018877964
for i in strings: msg = f'{msg}{i}\n' : 0.004438264997588703
build long string from 100 parts:
'\n'.join(strings) : 0.009715741998661542
for i in strings: msgs.append(i); '\n'.join(msgs) : 0.027317003998177825
for i in strings: msg += msg + i + '\n' : 0.06035234099908848
for i in strings: msg = f'{msg}{i}\n' : 0.08395763999942574
build long string from 1000 parts:
'\n'.join(strings) : 0.09142985900325584
for i in strings: msgs.append(i); '\n'.join(msgs) : 0.2942033840008662
for i in strings: msg += msg + i + '\n' : 0.6274634230030642
for i in strings: msg = f'{msg}{i}\n' : 4.329035735998332
join long list to the end of short list:
devs contains 1000 devices
['nmcli', 'd', 'delete'] + devs : 2.9230779689969495
['nmcli', 'd', 'delete', *devs] : 2.972522068001126
pregenerate list or supply just interator object?
(f'sth{dev}' for dev in vlan_range=1000) 0.13947492800070904
[f'sth{dev}' for dev in vlan_range=1000] 0.12493522400109214
generate commands string using comprehension or commands list using for loop (number=1000)?
generate command string: 0.11782203500115429
generate command list in for loop: 3.5407861940002476
generate command list using nested comprehension: 0.18802540999968187
append to list or add to list ({number=})?
for s in strings: ret.append(s): 1.9168673099993612
for s in strings: ret += [s]: 2.878833714999928