inicio mail me! sindicaci;ón

html5lib is getting faster

I ran the benchmark again with the 1014 revision of html5lib and I noticed a major speedup (altough miles far from the other libraries) in parsing.

The benchmark itself ran in roughly 262 seconds instead of the previous roughly 457. 57% faster.

These are the numbers on the 30 of August:

html5lib.HTMLParser, only HTML - time: 209.537359953, errors: 6019
html5lib.HTMLParser, only XML - time: 247.377570152, errors: 15566
Total: 456.914930

These come from today:

html5lib.HTMLParser, only HTML - time: 97.3409409523, errors: 6019
html5lib.HTMLParser, only XML - time: 164.554941893, errors: 15566
Total: 261.895883

This is the profiling:

         582972 function calls (579727 primitive calls) in 2.196 CPU seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.196    2.196 {execfile}
        1    0.049    0.049    2.196    2.196 html5lib_profile.py:1(module)
        1    0.002    0.002    1.584    1.584 html5lib_profile.py:3(html5lib_parse)
        1    0.000    0.000    1.580    1.580 html5parser.py:130(parse)
        1    0.053    0.053    1.580    1.580 html5parser.py:76(_parse)
     6832    0.071    0.000    1.173    0.000 tokenizer.py:88(__iter__)
     8216    0.572    0.000    0.686    0.000 inputstream.py:249(charsUntil)
        1    0.038    0.038    0.512    0.512 __init__.py:13(module)
     6763    0.061    0.000    0.477    0.000 tokenizer.py:298(dataState)
        1    0.029    0.029    0.439    0.439 html5parser.py:7(module)
        1    0.088    0.088    0.312    0.312 treebuilders/simpletree.py:1(module)
     2059    0.013    0.000    0.294    0.000 tokenizer.py:585(attributeValueDoubleQuotedState)
        1    0.039    0.039    0.196    0.196 saxutils.py:4(module)
        1    0.090    0.090    0.156    0.156 urllib.py:23(module)

Related posts

  • SGML Python parsers benchmark
  • Profiling Django
  • Hype, the Python Indexer
  • EuroPython Day 3
  • EuroPython Day 2
  • Leave a Comment