Another wonderful day in the Python-land. Today I met a lot of other people and I had interesting discussion with Steve Alexander from Canonical. I also had the chance to ask a question to Guido about the “hated” buffer object protocol after his track about the future of our beloved language.
I had a wonderful evening in the CERN globe in front of the CERN facility where I had dinner with all the devs introduced by an amazing talk given by the head of the CERN IT Department, Mr. Wolfgang von Rüden. He talked about the history and organization of CERN and the importance of the IT in the CERN work. They have a massive huge grid computing infrastructure to handle the millions of sensor of LHC (the accelerator underground) and ATLAS (the under construction detecting system). A lot of data: speaking of petabytes a day. Yes, P-E-T-A-B-Y-T-E-S.
The grid is spreaded all over the world, mostly in Europe, but in USA and Asia as well.
Let’s see something interesting of today’s tracks:
Python as a domain specific language
I was a bit late because I was talking with someone so I have only few quite useless notes but slides are online
:
Syntax checking with new
Metamodules
External (remote) methods:
- decorator for return type
- default arguments for call types
@method(Int())
fooMethod(self, fooArg=String()):
return [42]
Multiple inheritance to do mixin functional
Create a DSL using python power
6 months to code it
Developing MailManager
MailManager is a self-explanatory application that seemed really nice to me. It doesn’t support digital signature
VC funded company based on Zope
- ticketing system for dealing with incoming email
- distributed email system
- admin module
- reports
- xhtml/css interface
- unicode and i18n support
- queueing support for high volume
- filtering for email
- rulesets to customize further
- postgresql and mysql now
- sql server, oracle and sqlite soon
- rpc interface
Company history:
- founded in 2002
- funded in 2005 with 300k £
- 2006 mail manager 3.1
TDD and continuous integration (with buildbot), VMWare
PyJIT: dynamic code generation from runtime data
This talk was really cool. Simon Burton, a genius-but-kinda-weird guy from Australia, showed us how you can be faster than C in numeric processing with its JIT
UPDATE: finding his work was kinda difficult but here’s the svn repository: pyjit svn
Simon Burton, National ICT Australia
Python generates machine code. Go faster!
JIT compilation
- apps: numerical linear algebra, decision trees
- inline the parameters and compile to machine code
- Driving PyJIT:
- with python source code -> operations on native types,
good for numeric processing
- with low-level constructs
- use LLVM: very fast!
app 1: numerical linear algebra using PETSc
app 2: decision tree
faster than C coded program. x2.5 speedup
app 3: vectorized operations (where)
where(a < cutoff, b, c)
- a, b, c arrays same length
- result[i] = (a[i] < cutoff) ? b[i] : c[i]
50 times faster than Python code, 3 times faster than Psyco,
8 times faster than numpy
app 4: interval arithmetic
future: translate entire py programs (like psyco),
implement numpy semantics with optimizations
After the coffe break there was 3 talks in a row about PyPy
An Introduction to PyPy
Michael Hudson did an overview of the all project and answered some interesting questions:
What is:
- implementation of python
- compiler framework, targeted at interpreters
- MIT licensed
- research project funded by EU
- lot of fun
Demo:
- pypy-c interactive console
- fast as jython
- customized binaries
Motivation:
- extend the implementation of python
- jit, stackless, ease porting
CPython is written in C, hard to port
RPython: subset of python, restricted python, static enough
Translation aspects: RPython is very high level
- produce customized Python implementations
Standard interpreter:
- bytecode evaluator: same bytecode as CPython but
treats objs as black boxes
- standard object space: library of objects
- parser/compiler
pretty stable now
Compiler Framework:
- flow analysis: deduce a control flow from python code
- annotator: deduce types and other information
- rtyper: uses the annotator information to lay out types in memory
- backend: translates into code (C, CLI, LLVM)
The flow model:
c = a + b -> v_c = add(v_a, v_b)
The annotator:
- operates on live objects
PyPy is hiring!
PyPy architecture session
This talk was very interesting but I was so concentrated in listening and I didn't take any notes. Sorry
What can PyPy do for you?
The PyPy guys demonstrated how you can use PyPy for real. The thunk object space to compute objects lazily. The logic object space to simulate prolog-logic constructs. The Javascript backend with HTML and bub-n-bros and much more interesting stuff!
Using decorators
After the break (my head was kinda exploding due to PyPy niceties
) Michele Simionato got into decorators world. Not so into deep actually but mainly talking about what decorators do in the real world and how they can be useful.
@decor
def foo(): pass
shortcut for
def foo(): pass
foo = decor(foo)
Decorators changed the way we think about functions
with statement + contextlib module
def traced(func):
def newfunc(*args, **kwargs):
print "calling %s with arguments %s, %s' % (func.__name__, args, kw)
return func(*args, **kwargs)
@traced
def square(x): return x * x
broken info with help()
Use Simionato's decorator module:
decorators uses:
- tracing
- timing:
@time
def mycomputation(): pass
- logging
- caching:
@cached
def mylongcomputation(): pass
- access control:
@admin
@user
- tail_recursive:
@tail_recursive
def fact(n, acc=1): return fact(n - 1, acc * n)
http://www.phyast.pitt.edu/~micheles/python/documentation.html
Caveats:
- you may have performance issues
- your traceback will become longer
- you may end up being too clever
http://wiki.python.org/moin/PythonDecorators
Useful and New Modules
Andrew Dalke introduced some of the new modules in the stdlib of Python 2.5 like and some old ones:
SQLite:
- easy to setup, small and self-contained
ctypes:
- ffi in the stdlib in python 2.5
- argtypes, restype, C types
- use setup.py to use arbitrary libraries
ElementTree:
- process XML fast
- use it in conjunction with the with statement
subprocess:
- replaces popen, system, popen, commands, pipes
csv:
- reader/writer, quoted fields
- support excel dialects
optparse:
- getopt can become very complicated
- optparse has aspects in different location
bisect:
- binary search
py.execnet: ad-hoc networking
Holger Krekel presented his in-development project about distributed computing:
Reason for distributing services:
- remote access to local system resources
- security
- reliability
- scalability
network protocols:
- RPC
- text based protocols (chat)
- web services, soap
global standards are useful for large scale cooperative programs!
standard problem:
- matching and compatible sw versions
- prior installation, configuration and setup
- overhead on designing, testing and maintaining the std
- GUID schemes for refactoring
py.execnet concepts:
- client side injects local protocol code
- client and the other side interacts through Channels
- channels can receive and send arbitrary marshallable python structs
- asynch executing program fragments
channels and gateways:
- DEMO
Example: remote file processing
status: usable for 2 peer distribution/deployment
makes distribution easy but sharing state hard
channels cannot span multiple gateways/hops yet
basically works on win32, osx and linux (ssh not on win32)
future:
- dev happens on a demand basis
- support for better sharing
- extending p2p architecture
- py-dev@codespeak.net
- training/support possible
That was all for the regular tracks. I enjoyed very much the following lightning talks. You can learn a lot of stuff in 5 minutes. I'm looking forward to the tomorrow session. It's amazing to see the man sitting next to you nodding and saying "Ecco quello che cerco!" ("That's what I was looking for!")
I present you here all my notes for the lightnings:
Massa is a great entertainer
Moshe about components:
- http://twistedmatrix.com/users/moshez/lightning.html
Michael Hudson about pydoctor:
- http://codespeak.net/svn/user/mwh/pydoctor/trunk/
- uses the AST module in Python
- analyses systems or whole collections of modules
- works "outside in" or "breadth first"
interesting features:
- extract docstrings
- computes hierarchies, understands zope.interface
- local imports
- 2500LOC
- tested
- undocumented
how it works:
- generate a pickle
- generate HTML from pickle
- used by Twisted API doc
Matei Ciobotaru about ATLAS (cern and university of california):
data acquisition framework
- 150 Gb/sec of data from the detector
- 3000 computers network
how it works:
network test
- python controls a FPGA-based network tester
- test executed and results analyzed using python scripts
- low-level PCI hw with a C API
- distributed fw with XML-RPC
- automatic generation of plots
net management
- python module to communicate with ethernet switches
- devices can be entirely managed using python scripts
- discovery of the physical topology of the netrowk
- generation of reports
real-time monitoring
- python used in network traffic monitoring
- real time data gathered with SNMP and sFlow
- measure link occupancy and find the most active user of the net
- troubleshooting network congestion issues
Instance Manager:
- http://plone.org/projects/instancemanager
- zope tool to automatize stuff
Armin Rigo on pypy's sharedref.py:
- very stunning
Philipp von Weitershausen about zope.testbrowser:
- selenium too slow, can't be automated (buildbot)
- zope.testbrowser is a programmable browser
- zope.testrecorder to record tests
Fantastic tool to test web pages in Zope. doctest integrated too.
Usable also with any web application (!!!)
Sebastian Lopeski about secure sw:
- http://cern.ch/SecureSoftware
spreed.com:
- real time app for conferencing system
- zope based
itools.catalog demo by luis:
- 16M of data
- 23.3 seconds to index it
- quite slow compared to hype I think
Philipp von Weitershausen about properties vs decorators:
- property() laves getters and setters in the class
- @apply doesn't leave getters and setters behind but it's not compact
- http://codespeak.net/svn/user/philikon/rwproperty
- http://codespeak.net/svn/user/philikon/classproperty
Rob Collins about PSF:
- ladybird massage to raise funds for the PSF (lol)
Marcel MacMahon about pywinauto
The final session was the keynote from GvR himself. He talked a lot about where Python does want to go and his vision. He also asked a bunch of questions including mine.
The Future of Python
Python 3000:
- not a new language
- fix design bugs
- allow incompatible changes within reason
- get rid of deprecated stuff
- consider what's best feature going forward
Python 3.0 does not have to maximize breakage but allow breakage!
Classic classes will definitely go away.
Python 3000 process:
- need a process or get lost
- too many proposals competing for time
- don't want to become the next perl 6
- meta questions
- python 2.x has to be maintained
- will be backported some stuff
- first alpha: not before next year
- final release: a year after that probably
- may release 3.1 and 3.2 soon after
- python 2.x and 3.x developed parallely
Incompatibilities:
- new keywords allowed
- dict.keys(), range(), zip() won't return lists.
killing dict.iterkeys(), xrange(), itertools.izip()
- all strings Unicode; mutable 'bytes' data type
- binary file I/O redesign
- drop <> as alias for !=
- but not:
- dict keys instead of dict.keys()
- change meaning of else-clause on for/while
- change operator priorities
How to Migrate Code:
- can't do perfect mechanical translation:
- many proposed changes are semantic, not syntactic
- most likely approach:
- use pychecker like tool to do an 80+% job
- create an instrumented version of Python 2.x that
warns about doomed code (eg. d.keys().sort())
Five is right out:
- PEP 3099
- python 3k will not:
- have programmable syntax or macros
- add syntax for parallel iteration (use zip())
- change hash, keys etc. to attributes
- change iterating over a dict to yield key-value pairs
- we shouldn't:
- change the l&f of the language too much
- mage confusing changes
- add radical new features
Python 3000 Features:
- PEP 3100
- python.org/dev/peps
- blog blah blah
Basic cleanup:
- kill classic classes
- exceptions will derive from BaseException
- int / int will return a float
- remove the last differences between int and long
- absolute import by default
- kill sys.exc_type and friends (exc_info() stays)
- kill dict.has_key(), file.xreadlines(), ...
- kill apply(), input(), buffer(), coerce(), ...
- kill ancient library modules
Minor syntactic changes:
- exec becomes a function again
- kill 'x' in favor of repr(x)
- change except clause syntax to
except E1, E2, E3 as err:
- this avoids the bug in
except E1, E2: # meant except (E1, E2)
- [f(x) for x in S] becomes syntactic sugar for list(f(x) for x in S)
- subtle changes in need of parentheses
range becomes xrange():
- range() will no longer return a list (no iterator either)
- fix xrange() long support
zip() becomes izip()
lambda lives!:
- last year I said I wanted lambda to die (but open to a better version)
- we still don't have a better version (despite a year of trying)
- so lambda lives!
String types reform:
- bytes and str instead of str and unicode:
- bytes is a mutable array of int (in range(256))
- encode/decode API?
- bytes have some str-ish methods (e.g. b1.find(b2))
- but not others (not b.upper())
- all data is either binary or text:
- all text data is represented as Unicode
- conversion happen at IO time
- different API for binary and text streams:
- platform decides file encoding
New IO stack:
- c stdio has too many problems:
- don't know how many bytes buffered
- write after read unsafe
- windows text mode nightmares
- universal newlines hacked in (input only)
- bytes/str gives an opportunity to fix all this:
- learn from Java streams API
- stackable components: buffering, encoding, ...
- sandbox/sio/sio.py for an early prototype
Print becomes a function:
- print x, y, z becomes print(x, y, z)
- print >>f, x, y, z becomes print(x, y, z, file=f)
- alternative(s?) to skip the space/newline:
- printf(format, x, y, z)
- printraw(x, y, z)? or print(x, y, z, raw=True)
- why not f.print(x, y, z)?
- because that requires support in every file type
- why change at all?
- print-as-statement is a barrier to evolution of
your program and the language (see PEP 3100)
dict.keys() and items() returns a set view
dict.values() a bag (multiset) view
Drop default inequalities:
- the def. implementations of < , <=, >, >=
aren't very useful (compare by address!)
- let these raise TypeError raised
- NB: the default implementations of ==, != should remain
(comparing identity is useful)
Generic functions implementation
That's all folks!



