<feed xmlns='http://www.w3.org/2005/Atom'>
<title>crawler, branch main</title>
<subtitle>Asynchronous webcrawler in Python.
</subtitle>
<id>https://git.brandonirizarry.xyz/crawler/atom?h=main</id>
<link rel='self' href='https://git.brandonirizarry.xyz/crawler/atom?h=main'/>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/'/>
<updated>2026-04-17T22:33:12Z</updated>
<entry>
<title>refactor: wrap the fetching-task with the semaphore</title>
<updated>2026-04-17T22:33:12Z</updated>
<author>
<name>Brandon C. Irizarry</name>
<email>brandon.irizarry@gmail.com</email>
</author>
<published>2026-04-17T22:33:12Z</published>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/commit/?id=42022288cebc70e249ae0b979f500172be860d24'/>
<id>urn:sha1:42022288cebc70e249ae0b979f500172be860d24</id>
<content type='text'>
That is, do that instead of wrapping the code inside 'fetch'.

The current approach conveys more pithily what our intention is - that
'fetch' is the entirety of the asynchronous task we want to "rate
limit".
</content>
</entry>
<entry>
<title>chore: untrack description</title>
<updated>2026-04-16T21:51:52Z</updated>
<author>
<name>Brandon C. Irizarry</name>
<email>brandon.irizarry@gmail.com</email>
</author>
<published>2026-04-16T21:51:52Z</published>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/commit/?id=48a32e67c8061c6d60cc1afe1547574e34885f5c'/>
<id>urn:sha1:48a32e67c8061c6d60cc1afe1547574e34885f5c</id>
<content type='text'>
</content>
</entry>
<entry>
<title>chore: add description for cgit</title>
<updated>2026-04-16T21:49:49Z</updated>
<author>
<name>Brandon C. Irizarry</name>
<email>brandon.irizarry@gmail.com</email>
</author>
<published>2026-04-16T21:49:49Z</published>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/commit/?id=0f02d1d7f657109acef231339a5e5a808a134325'/>
<id>urn:sha1:0f02d1d7f657109acef231339a5e5a808a134325</id>
<content type='text'>
</content>
</entry>
<entry>
<title>feat: configure max_concurrency from the command line</title>
<updated>2026-04-16T21:42:49Z</updated>
<author>
<name>Brandon C. Irizarry</name>
<email>brandon.irizarry@gmail.com</email>
</author>
<published>2026-04-16T21:42:49Z</published>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/commit/?id=0c055dc725f0b8f1b697495884d65dc3ad16537d'/>
<id>urn:sha1:0c055dc725f0b8f1b697495884d65dc3ad16537d</id>
<content type='text'>
</content>
</entry>
<entry>
<title>docs: add docstring to 'fetch'</title>
<updated>2026-04-16T21:36:54Z</updated>
<author>
<name>Brandon C. Irizarry</name>
<email>brandon.irizarry@gmail.com</email>
</author>
<published>2026-04-16T21:36:54Z</published>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/commit/?id=528f5c261ec6dc2d0090930d4e79a8fef72d3cbd'/>
<id>urn:sha1:528f5c261ec6dc2d0090930d4e79a8fef72d3cbd</id>
<content type='text'>
</content>
</entry>
<entry>
<title>feat: use a semaphore to limit how many tasks make requests</title>
<updated>2026-04-16T21:34:36Z</updated>
<author>
<name>Brandon C. Irizarry</name>
<email>brandon.irizarry@gmail.com</email>
</author>
<published>2026-04-16T21:34:36Z</published>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/commit/?id=28bb531bd6eec1a9139944fd5b5d1d05a08becd2'/>
<id>urn:sha1:28bb531bd6eec1a9139944fd5b5d1d05a08becd2</id>
<content type='text'>
</content>
</entry>
<entry>
<title>feat: use taskgroup + flush printing of '.' + catch timeouterror</title>
<updated>2026-04-16T21:25:43Z</updated>
<author>
<name>Brandon C. Irizarry</name>
<email>brandon.irizarry@gmail.com</email>
</author>
<published>2026-04-16T21:25:43Z</published>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/commit/?id=a6468fc6a28adc628e64c09ef1e953b8eaf0807a'/>
<id>urn:sha1:a6468fc6a28adc628e64c09ef1e953b8eaf0807a</id>
<content type='text'>
</content>
</entry>
<entry>
<title>feat: use full concurrency (no semaphores)</title>
<updated>2026-04-16T21:04:45Z</updated>
<author>
<name>Brandon C. Irizarry</name>
<email>brandon.irizarry@gmail.com</email>
</author>
<published>2026-04-16T21:04:45Z</published>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/commit/?id=aa716390836e5e58d9bf3987b24be3f2a4110245'/>
<id>urn:sha1:aa716390836e5e58d9bf3987b24be3f2a4110245</id>
<content type='text'>
</content>
</entry>
<entry>
<title>feat: add a timeout for sites that don't respond</title>
<updated>2026-04-16T20:36:52Z</updated>
<author>
<name>Brandon C. Irizarry</name>
<email>brandon.irizarry@gmail.com</email>
</author>
<published>2026-04-16T20:36:52Z</published>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/commit/?id=e771a3ebcee9a7c6c3efd08fa905cbe0591131b1'/>
<id>urn:sha1:e771a3ebcee9a7c6c3efd08fa905cbe0591131b1</id>
<content type='text'>
This should diminish the urge for reaching for Ctrl+C.
</content>
</entry>
<entry>
<title>feat: ping sites sequentially and measure how long it takes</title>
<updated>2026-04-16T20:34:00Z</updated>
<author>
<name>Brandon C. Irizarry</name>
<email>brandon.irizarry@gmail.com</email>
</author>
<published>2026-04-16T20:34:00Z</published>
<link rel='alternate' type='text/html' href='https://git.brandonirizarry.xyz/crawler/commit/?id=172bccd05779240d6d6cd087a1c19216f0a5e225'/>
<id>urn:sha1:172bccd05779240d6d6cd087a1c19216f0a5e225</id>
<content type='text'>
</content>
</entry>
</feed>
