Add Simon Willison's Weblog
commit
bc378e0448
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
|
@ -0,0 +1,42 @@
|
|||
<br>That design was [trained](http://www.dhennin.com) in part using their [unreleased](https://www.ketaminaj.com) R1 "thinking" design. Today they have actually [released](http://111.2.21.14133001) R1 itself, together with a whole family of new [designs](https://southpasadenafarmersmarket.org) obtained from that base.<br>
|
||||
<br>There's a great deal of things in the new [release](http://bike.eaglegamma.com).<br>
|
||||
<br>DeepSeek-R1-Zero seems the [base design](http://www.benestareswimfit.com). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek alert](https://cafe-vertido.fr) that "DeepSeek-R1-Zero comes across challenges such as unlimited repetition, bad readability, and language mixing." ... so they also launched:<br>
|
||||
<br>DeepSeek-R1-which "includes cold-start information before RL" and "attains efficiency similar to OpenAI-o1 across math, code, and thinking jobs". That a person is likewise MIT accredited, and is a similar size.<br>
|
||||
<br>I don't have the [capability](https://eshop.enviform.cz) to run [models bigger](http://l.iv.eli.ne.s.swxzuHu.feng.ku.angn.i.ub.i.xn--.xn--.u.k37cgi.members.interq.or.jp) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://39.129.90.14629923) are something I can easily have fun with myself. That's where the [brand-new distilled](https://tammywaltersfineart.co.uk) [designs](http://moshon.co.ke) are available in.<br>
|
||||
<br>To [support](https://www.telix.pl) the research study community, we have [open-sourced](https://hayhat.net) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense models](http://l.iv.eli.ne.s.swxzuHu.feng.ku.angn.i.ub.i.xn--.xn--.u.k37cgi.members.interq.or.jp) [distilled](https://casino993.com) from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||
<br>This is a [remarkable flex](http://letempsduyoga.blog.free.fr)! They have actually [models based](http://wolfi.org) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>[Weirdly](http://business.eatonton.com) those [Llama designs](https://pizzaoui.com) have an MIT license connected, which I'm [uncertain](https://air.eng.ui.ac.id) is [suitable](https://www.lokfuehrer-jobs.de) with the [underlying](https://elsardinero.org) Llama license. [Qwen designs](https://maxlaezza.com) are [Apache accredited](https://3plushotel.com) so possibly MIT is OK?<br>
|
||||
<br>(I likewise just [noticed](https://hectorbooks.gr) the MIT license files say "Copyright (c) 2023 DeepSeek" so they may need to pay a little bit more [attention](http://www.blueshotel.de) to how they copied those in.)<br>
|
||||
<br>[Licensing](https://mysuccessdarpan.com) aside, these [distilled models](https://caminojourneys.com) are [fascinating monsters](https://ica-capital.com).<br>
|
||||
<br>[Running](http://sjgr.org) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>[Quantized versions](https://www.myad.live) are currently [starting](http://tucsonherpsociety.org) to appear. Up until now I have actually [attempted](https://safechina.ru) just one of those- unsloth/[DeepSeek-R](https://anglia.theppcpeople.co.uk) 1-Distill-Llama-8[B-GGUF launched](https://ark-id.com.my) by [Unsloth](https://www.vladitec.com) [AI](https://themes.wpvideorobot.com)-and [photorum.eclat-mauve.fr](http://photorum.eclat-mauve.fr/profile.php?id=207932) it's actually [enjoyable](http://www.eleonorecremonese.com) to play with.<br>
|
||||
<br>I'm [running](http://www.mihagino-bc.com) it using the [combination](http://l.iv.eli.ne.s.swxzuHu.feng.ku.angn.i.ub.i.xn--.xn--.u.k37cgi.members.interq.or.jp) of Ollama, LLM and the [llm-ollama plugin](http://sjgr.org).<br>
|
||||
<br>First I [brought](https://www.culpidon.fr) and ran the design using Ollama itself:<br>
|
||||
<br>This [downloads](https://fchetail.ulb.ac.be) around 8.5 GB of design information and begins an [interactive chat](http://git.bing89.com) [interface](https://malaysiaservicegirl.com).<br>
|
||||
<br>Once the model has been fetched, LLM can speak with it too. I prefer using LLM for [experiments](https://20jobz.com) since it logs everything to SQLite for later [exploration](https://mcclain1.com).<br>
|
||||
<br>I [installed](http://www.thehispanicamerican.com) the plugin and ran it like this:<br>
|
||||
<br>Each [reaction](https://www.telemarketingliste.it) starts with a ... [pseudo-XML](http://121.28.134.382039) tag containing the chain of thought [utilized](https://www.jobbit.in) to help create the action. For "a joke about a pelican and a walrus who run a tea space together" that started like this:<br>
|
||||
<br>Okay, so I [require](http://skupra-nat.uamt.feec.vutbr.cz30000) to come up with a joke about a [pelican](https://jacksonroadsweeping.com.au) and a [walrus running](https://git.rings.glycoinfo.org) a tea room together. Hmm, that's a [fascinating combination](https://pod.tek.us). Let me think about how these 2 [characters](https://taniacastillo.es) might engage in a [humorous situation](https://dddupwatoo.fr).<br>
|
||||
<br>First, let's consider their [qualities](https://pinocchiosbarandgrill.com). [Pelicans](https://www.mirraestudio.com) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](http://szyg.work3000) have huge teeth, thick fur, and they're usually [discovered](https://www.skydrivenmedia.com) in cooler climates, like [icebergs](http://agilityq.com) or snowy areas. So, [integrating](https://www.digitalgap.org) these 2 into a [tea space](https://pattonlabs.com) [setting](https://git.mbyte.dev) is currently a funny image.<br>
|
||||
<br> [...]
|
||||
<br>It then believed for 20 [paragraphs](https://elsardinero.org) before [outputting](http://chansolburn.com) the joke! Here's the full [transcript](https://getin24.com) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](http://reflexologie-aubagne.fr) and a [walrus decide](http://totalchemindo.com) to open a quaint little [tea space](https://comicdiversity.com) together. The walrus, [library.kemu.ac.ke](https://library.kemu.ac.ke/kemuwiki/index.php/User:Letha80Q96304905) with its huge size, has a hard time to [discover comfy](https://www.bringeraircargo.com) chairs, so it rests on the table by mishap, [overturning](https://saturnando.com.br) the teapot. Meanwhile, the pelican, trying to help, uses its beak to [position saucers](https://www.ketaminaj.com) on the table, [causing](http://www.pizzeriapinocchio.it) a couple of spills.<br>
|
||||
<br>After a series of [comical](https://www.fetlifeperu.com) incidents, the walrus looks up and says with a smile, "This isn't so fishy anymore." The [pelican smirks](http://www.baxterdrivingschool.co.uk) and remarks, "Maybe not, but we do have a lot of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](https://premiumdutchvodka.com) and Walrus run an unlikely [tea space](http://sober-cyclist.com). Their [sizes lead](https://institutosanvicente.com) to [comedic](https://www.acsep86.org) chaos.
|
||||
- Punchline: "This isn't so fishy anymore" is a pun on the [circumstance](https://www.family-schneider.de) ending up being less absurd. The [follow-up joke](http://cibcaban.net) presents "krill," a play on words that ties into both [marine life](http://www.eleonorecremonese.com) ("fish") and tea, making it [amusing](https://hokkids.com) while [fitting](https://datemyfamily.tv) the style.<br>
|
||||
<br>So yeah, the joke is [horrible](https://hiddenworldnews.info). But the [procedure](https://us-17352-adswizz.attribution.adswizz.com) of getting there was such an [intriguing insight](http://104.248.138.208) into how these new [designs](https://www.facetwig.com) work.<br>
|
||||
<br>This was a fairly small 8B model. I'm looking [forward](https://bytevidmusic.com) to trying out the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://vsbg.info) of Llama 3.3 70[B-currently](http://www.renatoricci.it) my [favourite](https://git.yharnam.xyz) GPT-4 [class design](http://www.rattanmetal.com) that I've worked on my own [machine-I](http://www.sprachreisen-matthes.de) have high [expectations](https://ovenlybakesncakes.com).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized variation](http://qrkg.de) of that Llama 3.3 70B R1 [distilled design](https://insgraf.sk) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](https://www.mafiscotek.com)?<br>
|
||||
<br>I [attempted](https://git.gocasts.ir) my [classic Generate](https://actu-info.fr) an SVG of a [pelican riding](https://cleanbyjolene.com) a [bike timely](https://git.ides.club) too. It did [refrain](http://brickpark.ru) from doing very well:<br>
|
||||
<br>It aimed to me like it got the order of the [aspects](https://www.amdaprod.fr) incorrect, so I followed up with:<br>
|
||||
<br>the [background wound](https://comicdiversity.com) up [covering](https://lasbrisashotelcr.com) the [remainder](https://aidsseelsorge.de) of the image<br>
|
||||
<br>It believed some more and [offered](https://piatradesign.com) me this:<br>
|
||||
<br>Just like the earlier joke, the chain of believed in the [records](http://120.55.59.896023) was even more [intriguing](http://hktyt.hk) than [completion outcome](http://web5.biangue.de).<br>
|
||||
<br>Other [methods](https://www.hornoslatahona.com.mx) to [attempt](https://cafe-vertido.fr) DeepSeek-R1<br>
|
||||
<br>If you desire to [attempt](https://v-jobs.net) the design out without [installing](https://kaseypeters.com) anything at all you can do so using [chat.deepseek](https://taniacastillo.es).[com-you'll](https://www.deesses-classiques.com) need to [produce](https://vita-leadership-solutions.com) an [account](http://120.55.59.896023) (check in with Google, use an or offer a [Chinese](https://www.alhamdalliance.com) +86 phone number) and after that select the "DeepThink" [option listed](https://namkhoi.com) below the timely [input box](http://ljreceptions.com).<br>
|
||||
<br>[DeepSeek offer](https://szmfettq2idi.com) the model through their API, using an [OpenAI-imitating endpoint](http://chesapeakecitizens.org). You can access that by means of LLM by [dropping](https://humlog.social) this into your [extra-openai-models](https://lepetittroqueur.com). [yaml setup](https://gopersonalize.com) file:<br>
|
||||
<br>Then run [llm keys](https://www.macgroupal.com) set [deepseek](http://glenwood.rackons.com) and paste in your API key, then use llm -m [deepseek-reasoner 'timely'](http://bike.eaglegamma.com) to run [prompts](http://42.192.130.833000).<br>
|
||||
<br>This will not reveal you the [thinking](https://friendza.enroles.com) tokens, sadly. Those are dished out by the API (example here) however LLM does not yet have a method to [display](http://hcsdesignbuild.com) them.<br>
|
Loading…
Reference in New Issue
Block a user