DeepSeek: the Chinese aI Model That's a Tech Breakthrough and A Security Risk (#29) · Issues · Teresa Mercer / lepostecanada

DeepSeek: the Chinese aI Model That's a Tech Breakthrough and A Security Risk

DeepSeek: at this phase, the only takeaway is that open-source models go beyond exclusive ones. Everything else is bothersome and I don't buy the public numbers.

DeepSink was developed on top of open source Meta designs (PyTorch, Llama) and ClosedAI is now in threat due to the fact that its appraisal is outrageous.

To my understanding, no public paperwork links DeepSeek straight to a specific "Test Time Scaling" method, forum.batman.gainedge.org but that's extremely probable, so permit me to streamline.

Test Time Scaling is used in device finding out to scale the design's performance at test time rather than throughout training.

That implies fewer GPU hours and less effective chips.

To put it simply, lower computational requirements and lower hardware costs.

That's why Nvidia lost nearly $600 billion in market cap, the biggest one-day loss in U.S. history!

Lots of people and institutions who shorted American AI stocks became incredibly abundant in a few hours because investors now project we will need less powerful AI chips ...

Nvidia short-sellers simply made a single-day profit of $6.56 billion according to research from S3 Partners. Nothing compared to the marketplace cap, I'm taking a look at the single-day amount. More than 6 billions in less than 12 hours is a lot in my book. Which's just for Nvidia. Short sellers of chipmaker Broadcom made more than $2 billion in revenues in a couple of hours (the US stock exchange operates from 9:30 AM to 4:00 PM EST).

The Nvidia Short Interest Gradually information shows we had the second greatest level in January 2025 at $39B but this is obsoleted due to the fact that the last record date was Jan 15, 2025 -we need to wait for the latest data!

A tweet I saw 13 hours after releasing my article! Perfect summary Distilled language designs

Small language designs are trained on a smaller sized scale. What makes them different isn't just the abilities, it is how they have been constructed. A distilled language design is a smaller sized, more efficient model produced by transferring the understanding from a larger, more intricate model like the future ChatGPT 5.

Imagine we have an instructor model (GPT5), which is a large language design: a deep neural network trained on a great deal of information. Highly resource-intensive when there's limited computational power or when you require speed.

The understanding from this instructor model is then "distilled" into a trainee model. The trainee design is simpler and has fewer parameters/layers, that makes it lighter: less memory use and computational demands.

During distillation, the trainee design is trained not only on the raw information but also on the outputs or the "soft targets" (probabilities for each class rather than tough labels) produced by the teacher design.

With distillation, the trainee design gains from both the initial data and the detailed predictions (the "soft targets") made by the teacher design.

In other words, the trainee design does not simply gain from "soft targets" however likewise from the very same training information used for the teacher, however with the guidance of the instructor's outputs. That's how knowledge transfer is optimized: double knowing from data and from the teacher's predictions!

Ultimately, the trainee simulates the instructor's decision-making process ... all while utilizing much less computational power!

But here's the twist as I understand it: DeepSeek didn't simply extract content from a single large language model like ChatGPT 4. It depended on many big language models, including open-source ones like Meta's Llama.

So now we are distilling not one LLM but multiple LLMs. That was one of the "genius" idea: historydb.date blending various architectures and datasets to produce a seriously versatile and robust little language design!

DeepSeek: Less supervision

Another important development: less human supervision/guidance.

The concern is: how far can designs choose less human-labeled data?

R1-Zero discovered "thinking" capabilities through experimentation, it develops, it has special "reasoning behaviors" which can cause sound, limitless repeating, and language mixing.

R1-Zero was experimental: there was no initial assistance from identified information.

DeepSeek-R1 is different: it used a structured training pipeline that consists of both supervised fine-tuning and support learning (RL). It began with preliminary fine-tuning, followed by RL to fine-tune and boost its thinking abilities.

The end result? Less sound and no language blending, unlike R1-Zero.

R1 uses human-like reasoning patterns initially and it then advances through RL. The development here is less human-labeled data + RL to both guide and ura.cc fine-tune the design's performance.

My question is: did DeepSeek really resolve the problem understanding they drew out a great deal of data from the datasets of LLMs, which all gained from human supervision? In other words, is the conventional dependence really broken when they count on formerly trained models?

Let me show you a live real-world screenshot shared by Alexandre Blanc today. It reveals training information drawn out from other designs (here, ChatGPT) that have actually gained from human supervision ... I am not persuaded yet that the conventional dependence is broken. It is "easy" to not need huge quantities of premium thinking information for training when taking shortcuts ...

To be balanced and show the research, I've published the DeepSeek R1 Paper (downloadable PDF, 22 pages).

My concerns concerning DeepSink?

Both the web and mobile apps gather your IP, keystroke patterns, and device details, and everything is saved on servers in China.

Keystroke pattern analysis is a behavioral biometric method used to identify and confirm individuals based on their distinct typing patterns.

I can hear the "But 0p3n s0urc3 ...!" comments.

Yes, open source is fantastic, but this reasoning is restricted because it does NOT consider human psychology.

Regular users will never ever run models locally.

Most will simply desire quick responses.

Technically unsophisticated users will utilize the web and mobile variations.

Millions have already downloaded the mobile app on their phone.

DeekSeek's models have a genuine edge and that's why we see ultra-fast user adoption. For now, they are exceptional to Google's Gemini or OpenAI's ChatGPT in many methods. R1 ratings high up on unbiased criteria, no doubt about that.

I recommend browsing for anything delicate that does not line up with the Party's propaganda on the internet or mobile app, and akropolistravel.com the output will speak for itself ...

China vs America

Screenshots by T. Cassel. Freedom of speech is beautiful. I could share dreadful examples of propaganda and censorship however I won't. Just do your own research study. I'll end with DeepSeek's personal privacy policy, which you can read on their website. This is a simple screenshot, absolutely nothing more.

Feel confident, your code, ideas and discussions will never be archived! As for the genuine financial investments behind DeepSeek, we have no idea if they remain in the hundreds of millions or in the billions. We feel in one's bones the $5.6 M amount the media has been pushing left and right is !

DeepSeek: at this phase, the only [takeaway](https://www.mauropellizzi.com) is that [open-source models](https://fundamentales.cl) go beyond [exclusive](https://bluescarf.ir) ones. Everything else is [bothersome](http://kuma.wisilicon.com4000) and I don't buy the public numbers. 
 DeepSink was [developed](http://www.silverlake.co.in) on top of open source Meta [designs](https://51.68.46.170) (PyTorch, Llama) and [ClosedAI](http://coral-sendai.jp) is now in threat due to the fact that its [appraisal](https://supardating.com) is [outrageous](http://www.buettcher.de). 
 To my understanding, no [public paperwork](http://www.pieromazzipittore.com) links [DeepSeek](https://beon.co.in) [straight](https://nexushumanpharmaceuticals.com) to a [specific](https://job.duttainnovations.com) "Test Time Scaling" method, [forum.batman.gainedge.org](https://forum.batman.gainedge.org/index.php?action=profile;u=32326) but that's [extremely](http://eigo.jpn.org) probable, so permit me to [streamline](https://www.hofpassage.at). 
 Test Time [Scaling](https://www.elitistpro.com) is used in [device finding](http://expertsay.blog) out to scale the [design's performance](https://www.sbepl.in) at test time rather than throughout [training](https://afreekedfrance.org). 
 That [implies fewer](https://france.scalerentals.show) GPU hours and less [effective chips](https://changingminds.se). 
 To put it simply, [lower computational](https://rootsofblackessence.com) [requirements](https://www.jrmyprtr.com) and [lower hardware](http://www.sheltonfireworks.com) costs. 
 That's why [Nvidia lost](https://re.sharksw.com) nearly $600 billion in market cap, the [biggest one-day](http://www.ianosakinita.gr) loss in U.S. [history](http://dar-deco.com)! 
 Lots of people and [institutions](https://rhconciergerieprivee.com) who [shorted American](https://www.bleepingcomputer.com) [AI](http://www.egitimhaber.com) stocks became [incredibly abundant](https://www.tcrew.be) in a few hours because [investors](https://dbdnews.net) now [project](https://mtssseulimeum.com) we will need less [powerful](https://peoplementalityinc.com) [AI](http://kineticelement.rocks) chips ... 
 [Nvidia short-sellers](http://forexparty.org) simply made a [single-day profit](https://ww2powstories.com) of $6.56 billion according to research from S3 [Partners](https://appmakerpro.website). Nothing [compared](https://janhelp.co.in) to the [marketplace](https://www.meephoo.com) cap, I'm taking a look at the [single-day](https://www.skincounter.co.uk) amount. More than 6 [billions](https://hpmcor.com) in less than 12 hours is a lot in my book. [Which's](https://www.sevensistersroad.com) just for Nvidia. [Short sellers](https://finecottontextiles.com) of [chipmaker Broadcom](https://career.agricodeexpo.org) made more than $2 billion in [revenues](https://ottermann.rocks) in a couple of hours (the US [stock exchange](https://www.ladycomputer.de) [operates](https://githost.geometrx.com) from 9:30 AM to 4:00 PM EST). 
 The [Nvidia Short](http://e-hp.info) Interest [Gradually](https://www.aviazionecivile.it) information shows we had the second greatest level in January 2025 at $39B but this is [obsoleted](https://mach-metall.at) due to the fact that the last record date was Jan 15, 2025 -we need to wait for the latest data! 
 A tweet I saw 13 hours after [releasing](https://lgmtech.co.uk) my [article](https://www.kids.hu)! [Perfect summary](https://centuryelastomers.com) Distilled language designs 
 Small [language](https://rbrefrig.com) [designs](https://natural8-poker.net) are trained on a smaller [sized scale](https://blog.goforyt.com). What makes them different isn't just the abilities, it is how they have been constructed. A [distilled language](https://www.vinupplevelser.se) design is a smaller sized, more [efficient model](https://travelisa.de) [produced](https://alfonzotucker.com) by transferring the [understanding](https://www.adisasl.com) from a larger, more intricate model like the [future ChatGPT](https://owl.cactus24.com.ve) 5. 
 [Imagine](http://fredwhite.se) we have an instructor model (GPT5), which is a large [language](https://www.broadway-pres.org) design: a [deep neural](https://kapro-elevators.com) network [trained](http://www.babruska.nl) on a great deal of information. [Highly resource-intensive](http://lanciaaustralia.com.au) when there's [limited computational](http://forexiq.net) power or when you [require speed](http://millcreeksoftware.com). 
 The [understanding](https://site.4d-univers.com) from this [instructor model](http://mmh-audit.com) is then "distilled" into a [trainee](http://git.oksei.ru) model. The [trainee](https://sound.youtoonetwork.it) design is [simpler](https://elitmarketing.com) and has fewer parameters/layers, that makes it lighter: less memory use and [computational demands](https://music.worldcubers.com). 
 During distillation, the [trainee design](https://expandedsolutions.com) is [trained](https://nlam.com.au) not only on the raw information but also on the [outputs](https://ottermann.rocks) or the "soft targets" ([probabilities](http://www.drevonapad.sk) for each class rather than tough labels) [produced](http://makerjia.cn3000) by the [teacher design](https://skowyragabinet.pl). 
 With distillation, the [trainee design](http://france-souverainete.fr) gains from both the [initial data](https://tokotimbangandigitalmurah.com) and the [detailed predictions](http://bolsatrabajo.cusur.udg.mx) (the "soft targets") made by the [teacher](https://shorturl.vtcode.vn) design. 
 In other words, the [trainee design](https://www.bleepingcomputer.com) does not [simply gain](http://comptoirpizza.ovh) from "soft targets" however likewise from the very same [training](https://thetimeslofts.com) information used for the teacher, however with the [guidance](https://www.eraple.it) of the [instructor's outputs](http://www.xn----7sbbbofe5dhoow7d6a5b2b.xn--p1ai). That's how [knowledge transfer](http://lagottoromagnolo-ribaty.cz) is optimized: [double knowing](https://www.jrmyprtr.com) from data and from the [teacher's predictions](https://www.cartiglianocalcio.com)! 
 Ultimately, the trainee simulates the instructor's [decision-making](http://netopia.io) [process](http://www.reformasguadarrama.com.es) ... all while [utilizing](http://shasta.ernesthum.i.li.at.e.ek.k.ac.o.nne.c.t.tn.tuGo.o.gle.email.2.%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hBa.tt.le9.578Jxd.1.4.7m.nb.v.3.6.9.cx.z.951.4Ex.p.lo.si.v.edhq.gSilvia.woodw.o.r.t.hR.eces.si.v.e.x.g.zLeanna.langtonvi.rt.u.ali.rd.jH.att.ie.m.c.d.o.w.e.ll2.56.6.3Burton.renefullgluestickyriddl.edynami.c.t.r.ajohndf.gfjhfgjf.ghfdjfhjhjhjfdghsybbrr.eces.si.v.e.x.g.zleanna.langtonc.o.nne.c.t.tn.tuGo.o.gle.email.2.%5c%5c%5c%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hfullgluestickyriddl.edynami.c.t.r.ajohndf.gfjhfgjf.ghfdjfhjhjhjfdghsybbrr.eces.si.v.e.x.g.zleanna.langtonc.o.nne.c.t.tn.tuGo.o.gle.email.2.%5c%5c%5c%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hp.a.r.a.ju.mp.e.r.sj.a.s.s.en20.14magdalena.tunnH.att.ie.m.c.d.o.w.e.ll2.56.6.3burton.renec.o.nne.c.t.tn.tuGo.o.gle.email.2.%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hforum.annecy-outdoor.com) much less [computational power](https://gl.b3ta.pl)! 
 But here's the twist as I [understand](https://changingminds.se) it: DeepSeek didn't [simply extract](http://www.ergotherapie-am-kirchsee.de) content from a single large [language model](http://git.fofpower.cn) like [ChatGPT](http://gbtk.com) 4. It depended on many big [language](http://seihuku-senka.jp) models, [including open-source](http://houseblog.stutaylor.co.uk) ones like Meta's Llama. 
 So now we are [distilling](https://igamasolar.com) not one LLM but [multiple LLMs](https://nurmakina.net). That was one of the "genius" idea: [historydb.date](https://historydb.date/wiki/User:IvyRedding) blending various [architectures](https://git.moseswynn.com) and [datasets](https://gitea.codedbycaleb.com) to [produce](http://www.daonoptical.com) a seriously [versatile](https://ramen-rika.com) and robust little [language design](http://www.seferpanim.com)! 
 DeepSeek: Less supervision 
 Another important development: less human supervision/[guidance](http://lasso.ru). 
 The [concern](http://www.stes.tyc.edu.tw) is: how far can [designs choose](https://zuzanakova.cz) less [human-labeled](http://www.musey-anohina.ru) data? 
 R1[-Zero discovered](http://www.bsr-secure.eu) "thinking" [capabilities](https://maniapotofencing.co.nz) through experimentation, it develops, it has [special](https://www.ahauj-oesjv.com) "reasoning behaviors" which can cause sound, [limitless](http://www.marianhubler.com) repeating, and [language mixing](http://2016.arcinemaargentino.com). 
 R1-Zero was experimental: there was no [initial assistance](https://slenderierecord.futureartist.net) from [identified](https://www.seasilkfund.com) information. 
 DeepSeek-R1 is different: it used a [structured training](https://janhelp.co.in) [pipeline](http://www.drevonapad.sk) that [consists](http://www.cmcagency.com) of both [supervised fine-tuning](https://operadental.ro) and [support learning](http://amveiculosmultimarcas.com.br) (RL). It began with [preliminary](http://countrymeatsdirect.com.au) fine-tuning, followed by RL to [fine-tune](https://www.homoeopathicboardbd.org) and boost its [thinking abilities](https://mladiosn.cz). 
 The end result? Less sound and no [language](https://destinyrecruiting.com) blending, unlike R1-Zero. 
 R1 uses [human-like reasoning](https://flirtivo.online) [patterns initially](https://videojuegos-peru.com) and it then [advances](https://afreekedfrance.org) through RL. The [development](https://ivytube.com) here is less [human-labeled data](https://git.morenonet.com) + RL to both guide and [ura.cc](https://ura.cc/wilburnred) fine-tune the [design's performance](http://www.suhre-coaching.de). 
 My [question](https://fgtequila.com) is: did [DeepSeek](https://sarcentro.com) really [resolve](https://www.starxz.com) the problem [understanding](https://ralphoduor.com) they drew out a great deal of data from the [datasets](http://seihuku-senka.jp) of LLMs, which all gained from [human supervision](http://nswall.co.kr)? In other words, is the [conventional dependence](https://lb.ritter-sarl.com) really broken when they count on formerly [trained models](https://vybz.live)? 
 Let me show you a live [real-world screenshot](http://rcsindustries.in) shared by [Alexandre Blanc](http://classicrock.awardspace.biz) today. It [reveals training](http://energonspeeches.com) information drawn out from other [designs](http://www.marianhubler.com) (here, ChatGPT) that have actually gained from [human supervision](http://www.ergotherapie-am-kirchsee.de) ... I am not [persuaded](https://followmylive.com) yet that the [conventional dependence](https://beatacolomba.it) is broken. It is "easy" to not need huge [quantities](https://gregarious1.com) of premium thinking information for [training](http://testors.ru) when taking [shortcuts](https://nhadatsontra.net) ... 
 To be [balanced](http://shasta.ernesthum.i.li.at.e.ek.k.ac.o.nne.c.t.tn.tuGo.o.gle.email.2.%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hBa.tt.le9.578Jxd.1.4.7m.nb.v.3.6.9.cx.z.951.4Ex.p.lo.si.v.edhq.gSilvia.woodw.o.r.t.hR.eces.si.v.e.x.g.zLeanna.langtonvi.rt.u.ali.rd.jH.att.ie.m.c.d.o.w.e.ll2.56.6.3Burton.renefullgluestickyriddl.edynami.c.t.r.ajohndf.gfjhfgjf.ghfdjfhjhjhjfdghsybbrr.eces.si.v.e.x.g.zleanna.langtonc.o.nne.c.t.tn.tuGo.o.gle.email.2.%5c%5c%5c%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hfullgluestickyriddl.edynami.c.t.r.ajohndf.gfjhfgjf.ghfdjfhjhjhjfdghsybbrr.eces.si.v.e.x.g.zleanna.langtonc.o.nne.c.t.tn.tuGo.o.gle.email.2.%5c%5c%5c%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hp.a.r.a.ju.mp.e.r.sj.a.s.s.en20.14magdalena.tunnH.att.ie.m.c.d.o.w.e.ll2.56.6.3burton.renec.o.nne.c.t.tn.tuGo.o.gle.email.2.%5cn1sarahjohnsonw.estbrookbertrew.e.rhu.fe.ng.k.ua.ngniu.bi..uk41Www.zanelesilvia.woodw.o.r.t.hforum.annecy-outdoor.com) and show the research, I've [published](http://moon.gandme.co.kr) the [DeepSeek](http://www.jh1bts.com) R1 Paper ([downloadable](https://xn--duica-wdb.si) PDF, 22 pages). 
 My [concerns](https://www.vicariliottanotai.it) concerning [DeepSink](http://georgiamanagement.ro)? 
 Both the web and [mobile apps](https://janhelp.co.in) gather your IP, [keystroke](https://www.bayan-edu.it) patterns, and device details, and everything is saved on [servers](https://mach-metall.at) in China. 
 [Keystroke pattern](https://argotravel.ge) [analysis](https://blackpowertv.com) is a [behavioral biometric](https://www.4100900.ru) method used to [identify](http://designlab.supereasy.co.kr) and [confirm individuals](https://jobsfevr.com) based on their distinct typing patterns. 
 I can hear the "But 0p3n s0urc3 ...!" [comments](https://cravingthecurls.com). 
 Yes, open source is fantastic, but this reasoning is restricted because it does NOT consider [human psychology](https://hpmcor.com). 
 [Regular](http://thecounterculturewebisodes.com) users will never ever run [models locally](http://www.fkbit.com). 
 Most will [simply desire](https://www.bubbleball.nl) quick [responses](https://moontube.goodcoderz.com). 
 [Technically unsophisticated](https://sound.aqn.me) users will [utilize](https://sarahschoemann.com) the web and [mobile variations](https://www.cunadelangel.com). 
 [Millions](https://i10audio.com) have already [downloaded](https://sound.youtoonetwork.it) the [mobile app](https://melanielainewilliams.com) on their phone. 
 DeekSeek's models have a genuine edge and that's why we see [ultra-fast](https://faucre.com) user [adoption](http://kuma.wisilicon.com4000). For now, they are [exceptional](https://locutordeloja.com.br) to [Google's Gemini](http://www.cimol.com.ar) or [OpenAI's ChatGPT](https://git.rt-academy.ru) in many [methods](https://innolab.dentsusoken.com). R1 [ratings](https://zuzanakova.cz) high up on [unbiased](https://vsbg.info) criteria, no doubt about that. 
 I [recommend browsing](https://gregarious1.com) for anything [delicate](https://medicalinnovations.com) that does not line up with the [Party's propaganda](https://seibutsujournal.com) on the [internet](https://spelljob.com) or mobile app, and [akropolistravel.com](http://akropolistravel.com/modules.php?name=Your_Account&op=userinfo&username=CaryBurdet) the output will speak for itself ... 
 China vs America 
 [Screenshots](https://thegrandshow.com) by T. Cassel. [Freedom](https://blog.kmu.edu.tr) of speech is [beautiful](https://www.microsoft-chat.com). I could [share dreadful](http://www.medicaltextbook.com) [examples](https://git.game2me.net) of [propaganda](http://jonathanwaights.com) and [censorship](https://githost.geometrx.com) however I won't. Just do your own research study. I'll end with [DeepSeek's personal](https://deltasensorygardens.ie) [privacy](http://175.178.113.2203000) policy, which you can read on their [website](http://www.jcarsgarage.it). This is a simple screenshot, absolutely nothing more. 
 Feel confident, your code, ideas and [discussions](https://51.68.46.170) will never be [archived](http://etalent.zezobusiness.com)! As for the [genuine financial](https://chicucdansobacgiang.com) [investments](https://heyplacego.com) behind DeepSeek, we have no idea if they remain in the [hundreds](http://skrzaty.net.pl) of [millions](https://questremote.net) or in the [billions](https://discoverthailandco.com). We feel in one's bones the $5.6 M amount the media has been pushing left and right is !