638
edits
(→Method) |
|||
(9 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
A <code>Set()</code> constructor is proposed for a future edition of the ECMAScript programming language standard. Should the constructor take a single iterable argument or any number of arguments? One factor in such a design decision is usefulness. Variable arguments are nicer when the number of values to be added to the set is known at the call site, as in <code>Set(1, 2, 3)</code> vs. <code>Set([1, 2, 3])</code>. A single iterable argument is nicer when the number of values is not statically known, as in <code>Set(names)</code> vs. <code>Set(...names)</code>. Which need is more common? | A <code>Set()</code> constructor is proposed for a future edition of the ECMAScript programming language standard. Should the constructor take a single iterable argument or any number of arguments? One factor in such a design decision is usefulness. Variable arguments are nicer when the number of values to be added to the set is known at the call site, as in <code>Set(1, 2, 3)</code> vs. <code>Set([1, 2, 3])</code>. A single iterable argument is nicer when the number of values is not statically known, as in <code>Set(names)</code> vs. <code>Set(...names)</code>. Which need is more common? | ||
A script was used to search the Python standard library for expressions that create nonempty sets. Of these expressions, the number of values | A script was used to search several Python codebases (including the Python standard library) for expressions that create nonempty sets. Of these expressions, the number of values to be added to the new set is statically apparent in less than half (41% in the standard library, 26.8% overall). | ||
=Background= | =Background= | ||
Line 15: | Line 15: | ||
* Calling the <code>set</code> constructor. The constructor takes an optional iterable argument which provides the values. The number of values in a call to <code>set</code> is sometimes statically apparent, as in <code>set([SUCCESS, FAILURE])</code>, and sometimes not, as in <code>set(self.get_labels())</code>. | * Calling the <code>set</code> constructor. The constructor takes an optional iterable argument which provides the values. The number of values in a call to <code>set</code> is sometimes statically apparent, as in <code>set([SUCCESS, FAILURE])</code>, and sometimes not, as in <code>set(self.get_labels())</code>. | ||
= | =Method= | ||
A copy of the Python standard library and a corresponding Python executable were obtained using the commands: | A copy of the Python standard library and a corresponding Python executable were obtained using the commands: | ||
Line 28: | Line 28: | ||
(On Mac OS X Lion, configure with <code>CC=gcc-apple-4.2 ./configure</code> instead. Apple’s LLVM-based gcc that ships with Lion can’t build Python.) | (On Mac OS X Lion, configure with <code>CC=gcc-apple-4.2 ./configure</code> instead. Apple’s LLVM-based gcc that ships with Lion can’t build Python.) | ||
The script below was then run in | The script below was then run in the <code>Lib</code> directory. The script searches the current directory for Python files. It disregards unit tests, as they typically contain very unusual code. It parses each other file using the Python parser and finds each set-creation expression. The expression <code>set()</code>, which creates an empty set, is disregarded; the rest are classified as one of the following: | ||
* A - statically known number of values (supporting variable arguments) | * A - statically known number of values (supporting variable arguments) | ||
Line 34: | Line 34: | ||
* C - using a comprehension (also supporting a single iterable argument, assuming ECMAScript gets array comprehensions) | * C - using a comprehension (also supporting a single iterable argument, assuming ECMAScript gets array comprehensions) | ||
<pre> | |||
from __future__ import division | |||
import os, ast, collections | |||
def target_files(dir): | |||
for dirpath, dirnames, filenames in os.walk(dir): | |||
dirnames[:] = [d for d in dirnames if d not in ('test', 'tests', '_tests')] | |||
for f in filenames: | |||
if f.endswith('.py') and not f.endswith('Tests.py'): | |||
yield os.path.join(dirpath, f) | |||
def main(): | |||
counts = collections.defaultdict(int) | |||
loc = 0 | |||
for filename in target_files('.'): | |||
with open(filename) as f: | |||
src = f.read() | |||
lines = src.splitlines() | |||
loc += len(lines) | |||
parse_tree = ast.parse(src, filename) | |||
def log(node, code): | |||
counts[code] += 1 | |||
print('{0} {1}:{2}:{3}'.format(code, filename, node.lineno, lines[node.lineno-1])) | |||
for node in ast.walk(parse_tree): | |||
if isinstance(node, ast.Call): | |||
if isinstance(node.func, ast.Name) and node.func.id == 'set': | |||
assert len(node.args) in (0, 1) | |||
assert len(node.keywords) == 0 | |||
assert node.starargs is None | |||
assert node.kwargs is None | |||
if len(node.args) == 1: | |||
arg = node.args[0] | |||
if isinstance(arg, (ast.Str, ast.List, ast.Tuple)): | |||
log(node, 'A') | |||
elif isinstance(arg, (ast.ListComp, ast.SetComp, ast.GeneratorExp)): | |||
log(node, 'C') | |||
else: | |||
log(node, 'B') | |||
elif isinstance(node, ast.Set): | |||
log(node, 'A') | |||
elif isinstance(node, ast.SetComp): | |||
log(node, 'C') | |||
total = sum(counts.values()) | |||
print() | |||
for t, n in sorted(counts.items()): | |||
print("Type {0}: {1} ({2:.1%})".format(t, n, n / total)) | |||
print("total: {0}".format(total)) | |||
print("set expressions per 1000 lines of code: {0}".format(1000 * total / loc)) | |||
main() | |||
</pre> | |||
= Results = | = Results = | ||
The full output is shown below. | The full output from parsing the Python standard library is shown below. There are 117 qualifying set-creation expressions in the standard library. Of these: | ||
* 48 (41%) are type A (supporting a constructor with variable arguments), | * 48 (41.0%) are type A (supporting a constructor with variable arguments), | ||
* 59 (50.4%) are type B (supporting a constructor with a single iterable argument), and | * 59 (50.4%) are type B (supporting a constructor with a single iterable argument), and | ||
* 10 (8.5%) are type C (comprehensions, supporting a constructor with a single iterable argument) | * 10 (8.5%) are type C (comprehensions, supporting a constructor with a single iterable argument) | ||
The numbers for all codebases are: | |||
<table class="data"> | |||
<tr> <th>Codebase <th>A <th>B <th>C <th>total <th>density | |||
<tr> <td>standard library <td> 48 (41.0%) <td> 59 (50.4%) <td>10 (8.5%) <td>117 <td>0.45 | |||
<tr> <td>Django <td> 9 (13.6%) <td> 36 (54.5%) <td>21 (31.8%) <td>66 <td>0.65 | |||
<tr> <td>Mercurial <td> 23 (14.0%) <td>110 (67.1%) <td>31 (18.9%) <td>164 <td>2.24 | |||
<tr> <td>MoinMoin <td> 21 (72.4%) <td> 7 (24.1%) <td> 1 (3.4%) <td>29 <td>0.76 | |||
<tr> <td>scons <td> 2 (28.6%) <td> 5 (71.4%) <td> 0 <td>7 <td>0.11 | |||
<tr> <td>'''Total''' <td>103 (26.8%) <td>217 (56.6%) <td>63 (16.4%) <td>383 <td>- | |||
</table> | |||
(The density column tells how many qualifying set-expressions each project has per 1000 lines of code.) | |||
(I also ran the analysis on Zope, but it apparently does not use sets at all.) | |||
<pre> | |||
A ./Lib/_markupbase.py:152: if sectName in {"temp", "cdata", "ignore", "include", "rcdata"}: | |||
A ./Lib/_markupbase.py:155: elif sectName in {"if", "else", "endif"}: | |||
A ./Lib/_markupbase.py:210: if name not in {"attlist", "element", "entity", "notation"}: | |||
A ./Lib/_markupbase.py:129: elif decltype in {"attlist", "linktype", "link", "element"}: | |||
B ./Lib/_pyio.py:158: modes = set(mode) | |||
A ./Lib/_pyio.py:159: if modes - set("axrwb+tU") or len(mode) > len(modes): | |||
C ./Lib/_weakrefset.py:175: return self.data <= set(ref(item) for item in other) | |||
C ./Lib/_weakrefset.py:182: return self.data >= set(ref(item) for item in other) | |||
C ./Lib/_weakrefset.py:187: return self.data == set(ref(item) for item in other) | |||
C ./Lib/abc.py:132: abstracts = {name | |||
A ./Lib/abc.py:189: return any(cls.__subclasscheck__(c) for c in {subclass, subtype}) | |||
B ./Lib/ast.py:60: return set(map(_convert, node.elts)) | |||
B ./Lib/bdb.py:22: self.skip = set(skip) if skip else None | |||
C ./Lib/cgi.py:627: return list(set(item.name for item in self.list)) | |||
B ./Lib/cmd.py:287: commands = set(self.completenames(*args)) | |||
C ./Lib/cmd.py:288: topics = set(a[5:] for a in self.get_names() | |||
A ./Lib/difflib.py:1221: if tag in {'replace', 'delete'}: | |||
A ./Lib/difflib.py:1224: if tag in {'replace', 'insert'}: | |||
A ./Lib/difflib.py:1305: if any(tag in {'replace', 'delete'} for tag, _, _, _, _ in group): | |||
A ./Lib/difflib.py:1314: if any(tag in {'replace', 'insert'} for tag, _, _, _, _ in group): | |||
A ./Lib/ftplib.py:931: if treply[:3] not in {'125', '150'}: raise error_proto # RFC 959 | |||
A ./Lib/ftplib.py:933: if sreply[:3] not in {'125', '150'}: raise error_proto # RFC 959 | |||
A ./Lib/ftplib.py:178: if s[:5] in {'pass ', 'PASS '}: | |||
A ./Lib/ftplib.py:228: if c in {'1', '2', '3'}: | |||
A ./Lib/ftplib.py:252: if resp[:3] not in {'426', '225', '226'}: | |||
A ./Lib/ftplib.py:570: if resp[:3] in {'250', '200'}: | |||
A ./Lib/ftplib.py:388: if user == 'anonymous' and passwd in {'', '-'}: | |||
A ./Lib/ftplib.py:819: if resp[:3] not in {'426', '225', '226'}: | |||
B ./Lib/hashlib.py:59:algorithms_guaranteed = set(__always_supported) | |||
B ./Lib/hashlib.py:60:algorithms_available = set(__always_supported) | |||
B ./Lib/inspect.py:994: possible_kwargs = set(args + kwonlyargs) | |||
B ./Lib/mailbox.py:1614: flags = set(flags) | |||
B ./Lib/mailbox.py:1110: all_keys = set(self.keys()) | |||
B ./Lib/mailbox.py:1646: flags = set(self.get_flags()) | |||
B ./Lib/mailbox.py:1733: sequences = set(self.get_sequences()) | |||
B ./Lib/mailbox.py:1823: labels = set(self.get_labels()) | |||
B ./Lib/mailbox.py:1547: flags = set(self.get_flags()) | |||
B ./Lib/mailbox.py:1744: sequences = set(self.get_sequences()) | |||
B ./Lib/mailbox.py:1836: labels = set(self.get_labels()) | |||
B ./Lib/mailbox.py:1141: for key in sorted(set(keys)): | |||
B ./Lib/mailbox.py:1511: self.set_flags(''.join(set(self.get_flags()) | set(flag))) | |||
B ./Lib/mailbox.py:1511: self.set_flags(''.join(set(self.get_flags()) | set(flag))) | |||
B ./Lib/mailbox.py:1560: flags = set(self.get_flags()) | |||
B ./Lib/mailbox.py:1636: self.set_flags(''.join(set(self.get_flags()) | set(flag))) | |||
B ./Lib/mailbox.py:1636: self.set_flags(''.join(set(self.get_flags()) | set(flag))) | |||
B ./Lib/mailbox.py:1669: flags = set(self.get_flags()) | |||
B ./Lib/mailbox.py:1846: labels = set(self.get_labels()) | |||
B ./Lib/mailbox.py:1516: self.set_flags(''.join(set(self.get_flags()) - set(flag))) | |||
B ./Lib/mailbox.py:1516: self.set_flags(''.join(set(self.get_flags()) - set(flag))) | |||
B ./Lib/mailbox.py:1568: flags = set(self.get_flags()) | |||
B ./Lib/mailbox.py:1641: self.set_flags(''.join(set(self.get_flags()) - set(flag))) | |||
B ./Lib/mailbox.py:1641: self.set_flags(''.join(set(self.get_flags()) - set(flag))) | |||
B ./Lib/mailbox.py:1679: flags = set(self.get_flags()) | |||
B ./Lib/mailbox.py:1757: sequences = set(self.get_sequences()) | |||
A ./Lib/netrc.py:74: tt in {'', 'machine', 'default', 'macdef'}): | |||
A ./Lib/nntplib.py:124:_LONGRESP = { | |||
A ./Lib/pydoc.py:166: if name in {'__builtins__', '__doc__', '__file__', '__path__', | |||
A ./Lib/pydoc.py:2022: if modname in {'test.badsyntax_pep3120', 'badsyntax_pep3120'}: | |||
B ./Lib/shutil.py:204: return set(ignored_names) | |||
B ./Lib/site.py:84: for m in set(sys.modules.values()): | |||
A ./Lib/socket.py:233:_blocking_errnos = { EAGAIN, EWOULDBLOCK } | |||
A ./Lib/socket.py:152: if c not in {"r", "w", "b"}: | |||
A ./Lib/sre_compile.py:27:_LITERAL_CODES = set([LITERAL, NOT_LITERAL]) | |||
A ./Lib/sre_compile.py:28:_REPEATING_CODES = set([REPEAT, MIN_REPEAT, MAX_REPEAT]) | |||
A ./Lib/sre_compile.py:29:_SUCCESS_CODES = set([SUCCESS, FAILURE]) | |||
A ./Lib/sre_compile.py:30:_ASSERT_CODES = set([ASSERT, ASSERT_NOT]) | |||
A ./Lib/sre_parse.py:22:DIGITS = set("0123456789") | |||
A ./Lib/sre_parse.py:24:OCTDIGITS = set("01234567") | |||
A ./Lib/sre_parse.py:25:HEXDIGITS = set("0123456789abcdefABCDEF") | |||
A ./Lib/sre_parse.py:27:WHITESPACE = set(" \t\n\r\v\f") | |||
A ./Lib/sre_parse.py:381:_PATTERNENDERS = set("|)") | |||
A ./Lib/sre_parse.py:382:_ASSERTCHARS = set("=!<") | |||
A ./Lib/sre_parse.py:383:_LOOKBEHINDASSERTCHARS = set("=!") | |||
A ./Lib/sre_parse.py:384:_REPEATCODES = set([MIN_REPEAT, MAX_REPEAT]) | |||
B ./Lib/stringprep.py:19:b1_set = set([173, 847, 6150, 6155, 6156, 6157, 8203, 8204, 8205, 8288, 65279] + list(range(65024,65040))) | |||
B ./Lib/stringprep.py:220:c22_specials = set([1757, 1807, 6158, 8204, 8205, 8232, 8233, 65279] + list(range(8288,8292)) + list(range(8298,8304)) + list(range(65529,65533)) + list(range(119155,119163))) | |||
B ./Lib/stringprep.py:247:c6_set = set(range(65529,65534)) | |||
B ./Lib/stringprep.py:252:c7_set = set(range(12272,12284)) | |||
B ./Lib/stringprep.py:257:c8_set = set([832, 833, 8206, 8207] + list(range(8234,8239)) + list(range(8298,8304))) | |||
B ./Lib/stringprep.py:262:c9_set = set([917505] + list(range(917536,917632))) | |||
B ./Lib/subprocess.py:1280: fds_to_keep = set(pass_fds) | |||
B ./Lib/sysconfig.py:724: archs = tuple(sorted(set(archs))) | |||
A ./Lib/tarfile.py:129:PAX_NAME_FIELDS = {"path", "linkpath", "uname", "gname"} | |||
B ./Lib/trace.py:133: self._mods = set() if not modules else set(modules) | |||
B ./Lib/collections/abc.py:421: return set(it) | |||
B ./Lib/collections/abc.py:437: return set(it) | |||
C ./Lib/concurrent/futures/_base.py:192: finished = set( | |||
C ./Lib/concurrent/futures/_base.py:254: done = set(f for f in fs | |||
B ./Lib/concurrent/futures/_base.py:195: pending = set(fs) - finished | |||
B ./Lib/concurrent/futures/_base.py:256: not_done = set(fs) - done | |||
B ./Lib/concurrent/futures/_base.py:275: return DoneAndNotDoneFutures(done, set(fs) - done) | |||
A ./Lib/distutils/msvc9compiler.py:255: interesting = set(("include", "lib", "libpath", "path")) | |||
B ./Lib/distutils/util.py:153: archs = tuple(sorted(set(archs))) | |||
A ./Lib/idlelib/CodeContext.py:18:BLOCKOPENERS = set(["class", "def", "elif", "else", "except", "finally", "for", | |||
A ./Lib/idlelib/EditorWindow.py:1614: if (not keylist) or (macosxSupport.runningAsOSXApp() and eventname in { | |||
C ./Lib/idlelib/MultiCall.py:127: substates = list(set(state & x for x in states)) | |||
A ./Lib/lib2to3/fixer_util.py:167:consuming_calls = set(["sorted", "list", "set", "any", "all", "tuple", "sum", | |||
A ./Lib/lib2to3/fixer_util.py:339:_def_syms = set([syms.classdef, syms.funcdef]) | |||
A ./Lib/lib2to3/fixer_util.py:382:_block_syms = set([syms.funcdef, syms.classdef, syms.trailer]) | |||
B ./Lib/lib2to3/main.py:220: avail_fixes = set(refactor.get_fixers_from_package(fixer_pkg)) | |||
C ./Lib/lib2to3/main.py:221: unwanted_fixes = set(fixer_pkg + ".fix_" + fix for fix in options.nofix) | |||
A ./Lib/lib2to3/patcomp.py:35: skip = set((token.NEWLINE, token.INDENT, token.DEDENT)) | |||
A ./Lib/lib2to3/refactor.py:60: return set([pat.type]) | |||
A ./Lib/lib2to3/fixes/fix_dict.py:39:iter_exempt = fixer_util.consuming_calls | set(["iter"]) | |||
B ./Lib/lib2to3/fixes/fix_numliterals.py:25: elif val.startswith('0') and val.isdigit() and len(set(val)) > 1: | |||
B ./Lib/multiprocessing/managers.py:397: self.id_to_obj[ident] = (obj, set(exposed), method_to_typeid) | |||
B ./Lib/packaging/run.py:237: for dist in set(opts['args']): | |||
A ./Lib/packaging/compiler/msvc9compiler.py:243: interesting = set(("include", "lib", "libpath", "path")) | |||
B ./Lib/packaging/pypi/simple.py:138: self._mirrors = set(mirrors) | |||
B ./Lib/packaging/pypi/xmlrpc.py:184: return [self._projects[name.lower()] for name in set(projects)] | |||
B ./Lib/packaging/pypi/xmlrpc.py:95: hidden_versions = set(all_versions) - set(existing_versions) | |||
B ./Lib/packaging/pypi/xmlrpc.py:95: hidden_versions = set(all_versions) - set(existing_versions) | |||
A ./Lib/wsgiref/handlers.py:25:_is_request = { | |||
B ./Lib/xml/etree/ElementTree.py:990: HTML_EMPTY = set(HTML_EMPTY) | |||
B ./Lib/xmlrpc/server.py:279: methods = set(self.funcs.keys()) | |||
B ./Lib/xmlrpc/server.py:284: methods |= set(self.instance._listMethods()) | |||
B ./Lib/xmlrpc/server.py:289: methods |= set(list_public_methods(self.instance)) | |||
Type A: 48 (41.0%) | |||
Type B: 59 (50.4%) | |||
Type C: 10 (8.5%) | |||
</pre> |
edits