학술논문

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Document Type
Working Paper
Author
Reid, MachelSavinov, NikolayTeplyashin, DenisLepikhin, DmitryLillicrap, TimothyAlayrac, Jean-baptisteSoricut, RaduLazaridou, AngelikiFirat, OrhanSchrittwieser, JulianAntonoglou, IoannisAnil, RohanBorgeaud, SebastianDai, AndrewMillican, KatieDyer, EthanGlaese, MiaSottiaux, ThibaultLee, BenjaminViola, FabioReynolds, MalcolmXu, YuanzhongMolloy, JamesChen, JilinIsard, MichaelBarham, PaulHennigan, TomMcIlroy, RossJohnson, MelvinSchalkwyk, JohanCollins, EliRutherford, ElizaMoreira, EricaAyoub, KareemGoel, MeghaMeyer, ClemensThornton, GregoryYang, ZhenMichalewski, HenrykAbbas, ZaheerSchucher, NathanAnand, AnkeshIves, RichardKeeling, JamesLenc, KarelHaykal, SalemShakeri, SiamakShyam, PranavChowdhery, AakankshaRing, RomanSpencer, StephenSezener, ErenVilnis, LukeChang, OscarMorioka, NobuyukiTucker, GeorgeZheng, CeWoodman, OliverAttaluri, NithyaKocisky, TomasEltyshev, EvgeniiChen, XiChung, TimothySelo, VittorioBrahma, SiddharthaGeorgiev, PetkoSlone, AmbroseZhu, ZhenkaiLottes, JamesQiao, SiyuanCaine, BenRiedel, SebastianTomala, AlexChadwick, MartinLove, JulietteChoy, PeterMittal, SidHoulsby, NeilTang, YunhaoLamm, MatthewBai, LibinZhang, QiaoHe, LuhengCheng, YongHumphreys, PeterLi, YujiaBrin, SergeyCassirer, AlbinMiao, YingjieZilka, LukasTobin, TaylorXu, KelvinProleev, LevSohn, DanielMagni, AlbertoHendricks, Lisa AnneGao, IsabelOntañón, SantiagoBunyan, OskarByrd, NathanSharma, AbhanshuZhang, BiaoPinto, MarioSinha, RishikaMehta, HarshJia, DaweiCaelles, SergiWebson, AlbertMorris, AlexRoelofs, BeccaDing, YifanStrudel, RobinXiong, XuehanRitter, MarvinDehghani, MostafaChaabouni, RahmaKarmarkar, AbhijitLai, GuangdaMentzer, FabianXu, BiboLi, YaGuangZhang, YujingPaine, Tom LeGoldin, AlexNeyshabur, BehnamBaumli, KateLevskaya, AnselmLaskin, MichaelJia, WenhaoRae, Jack W.Xiao, KefanHe, AntoineGiordano, SkyeYagati, LakshmanLespiau, Jean-BaptisteNatsev, PaulGanapathy, SanjayLiu, FangyuMartins, DaniloChen, NanxinXu, YunhanBarnes, MeganMay, RhysVezer, ArpiOh, JunhyukFranko, KenBridgers, SophieZhao, RuizheWu, BoxiMustafa, BasilSechrist, SeanParisotto, EmilioPillai, Thanumalayan SankaranarayanaLarkin, ChrisGu, ChenjieSorokin, ChristinaKrikun, MaximGuseynov, AlexeyLandon, JessicaDatta, RominaPritzel, AlexanderThacker, PhoebeYang, FanHui, KevinHauth, AnjaYeh, Chih-KuanBarker, DavidMao-Jones, JustinAustin, SophiaSheahan, HannahSchuh, ParkerSvensson, JamesJain, RohanRamasesh, VinayBriukhov, AntonChung, Da-Woonvon Glehn, TamaraButterfield, ChristinaJhakra, PriyaWiethoff, MatthewFrye, JustinGrimstad, JordanChangpinyo, BeerLan, Charline LeBortsova, AnnaWu, YonghuiVoigtlaender, PaulSainath, TaraSmith, CharlotteHawkins, WillCao, KrisBesley, JamesSrinivasan, SrivatsanOmernick, MarkGaffney, ColinSurita, GabrielaBurnell, RyanDamoc, BogdanAhn, JunwhanBrock, AndrewPajarskas, MantasPetrushkina, AnastasiaNoury, SebBlanco, LorenzoSwersky, KevinAhuja, ArunAvrahami, ThiMisra, Vedantde Liedekerke, RaoulIinuma, MarikoPolozov, AlexYork, SarahDriessche, George van denMichel, PaulChiu, JustinBlevins, RoryGleicher, ZachRecasens, AdriàRrustemi, AlbanGribovskaya, ElenaRoy, AurkoGworek, WiktorArnold, SébLee, LisaLee-Thorp, JamesMaggioni, MarcelloPiqueras, EnriqueBadola, KartikeyaVikram, SharadGonzalez, LucasBaddepudi, AnirudhSenter, EvanDevlin, JacobQin, JamesAzzam, MichaelTrebacz, MajaPolacek, MartinKrishnakumar, KashyapChang, Shuo-yiinTung, MatthewPenchev, IvoJoshi, RishabhOlszewska, KateMuir, CarrieWirth, MateoHartman, Ale JakseNewlan, JoshKashem, SheleemBolina, VijayDabir, Elahevan Amersfoort, JoostAhmed, ZafaraliCobon-Kerr, JamesKamath, AishwaryaHrafnkelsson, Arnar MarHou, LeMackinnon, IanFrechette, AlexandreNoland, EricSi, XianceTaropa, EmanuelLi, DongCrone, PhilGulati, AnmolCevey, SébastienAdler, JonasMa, AdaSilver, DavidTokumine, SimonPowell, RichardLee, StephanChang, MichaelHassan, SamerMincu, DianaYang, AntoineLevine, NirBrennan, JennyWang, MingqiuHodkinson, SarahZhao, JeffreyLipschultz, JoshPope, AedanChang, Michael B.Li, ChengShafey, Laurent ElPaganini, MichelaDouglas, SholtoBohnet, BerndPardo, FabioOdoom, SethRosca, MihaelaSantos, Cicero Nogueira dosSoparkar, KedarGuez, ArthurHudson, TomHansen, StevenAsawaroengchai, ChulayuthAddanki, RaviYu, TianheStokowiec, WojciechKhan, MinaGilmer, JustinLee, JaehoonBostock, Carrie GrimesRong, KeranCaton, JonathanPejman, PedramPavetic, FilipBrown, GeoffSharma, VivekLučić, MarioSamuel, RajkumarDjolonga, JosipMandhane, AmolSjösund, Lars LoweBuchatskaya, ElenaWhite, ElspethClay, NatalieJiang, JiepuLim, HyeontaekHemsley, RossLabanowski, JaneDe Cao, NicolaSteiner, DavidHashemi, Sayed HadiAustin, JacobGergely, AnitaBlyth, TimStanton, JoeShivakumar, KaushikSiddhant, AdityaAndreassen, AndersAraya, CarlosSethi, NikhilShivanna, RakeshHand, StevenBapna, AnkurKhodaei, AliMiech, AntoineTanzer, GarrettSwing, AndyThakoor, ShantanuPan, ZhufengNado, ZacharyWinkler, StephanieYu, DianSaleh, MohammadMaggiore, LorenBarr, IainGiang, MinhKagohara, ThaisDanihelka, IvoMarathe, AmitFeinberg, VladimirElhawaty, MohamedGhelani, NimeshHorgan, DanMiller, HelenWalker, LexiTanburn, RichardTariq, MukarramShrivastava, DishaXia, FeiChiu, Chung-ChengAshwood, ZoeBaatarsukh, KhuslenSamangooei, SinaAlcober, FredStjerngren, AxelKomarek, PaulTsihlas, KaterinaBoral, AnudhyanComanescu, RamonaChen, JeremyLiu, RuiboBloxwich, DawnChen, CharlieSun, YanhuaFeng, FangxiaoyuMauger, MatthewDotiwalla, XerxesHellendoorn, VincentSharman, MichaelZheng, IvyHaridasan, KrishnaBarth-Maron, GabeSwanson, CraigRogozińska, DominikaAndreev, AlekRubenstein, Paul KishanSang, RuoxinHurt, DanElsayed, GamaleldinWang, RenshenLacey, DaveIlić, AnastasijaZhao, YaoAroyo, LoraIwuanyanwu, ChimezieNikolaev, VitalyLakshminarayanan, BalajiJazayeri, SadeghKaufman, Raphaël LopezVaradarajan, ManiTekur, ChetanFritz, DougKhalman, MishaReitter, DavidDasgupta, KingshukSarcar, ShouryaOrnduff, TinaSnaider, JavierHuot, FantineJia, JohnsonKemp, RupertTrdin, NejcVijayakumar, AnithaKim, LucyAngermueller, ChristofLao, LiLiu, TianqiZhang, HaibinEngel, DavidGreene, SomerWhite, AnaïsAustin, JessicaTaylor, LillyAshraf, ShereenLiu, DangyiGeorgaki, MariaCai, IreneKulizhskaya, YanaGoenka, SonamSaeta, BrennanVodrahalli, KiranFrank, Christiande Cesare, DarioRobenek, BronaRichardson, HarryAlnahlawi, MahmoudYew, ChristopherPonnapalli, PriyaTagliasacchi, MarcoKorchemniy, AlexKim, YelinLi, DinghuaRosgen, BillLevin, KyleWiesner, JeremyBanzal, PraseemSrinivasan, PraveenYu, HongkunÜnlü, ÇağlarReid, DavidTung, ZoraFinchelstein, DanielKumar, RavinElisseeff, AndreHuang, JinZhang, MingZhu, RuiAguilar, RicardoGiménez, MaiXia, JiaweiDousse, OlivierGierke, WilliYeganeh, Soheil HassasYates, DamionJalan, KomalLi, LuLatorre-Chimoto, EriNguyen, Duc DungDurden, KenKallakuri, PraveenLiu, YaxinJohnson, MatthewTsai, TomyTalbert, AliceLiu, JasmineNeitz, AlexanderElkind, ChenSelvi, MarcoJasarevic, MimiSoares, Livio BaldiniCui, AlbertWang, PidongWang, Alek WenjiaoYe, XinyuKallarackal, KrystalLoher, LuciaLam, HoiBroder, JosefHoltmann-Rice, DanMartin, NinaRamadhana, BramandiaToyama, DanielShukla, MrinalBasu, SujoyMohan, AbhiFernando, NickFiedel, NoahPaterson, KimLi, HuiGarg, AnkushPark, JaneChoi, DongHyunWu, DianeSingh, SankalpZhang, ZhishuaiGloberson, AmirYu, LilyCarpenter, JohnQuitry, Félix de ChaumontRadebaugh, CareyLin, Chu-ChengTudor, AlexShroff, PrakashGarmon, DrewDu, DayouVats, NeeraLu, HanIqbal, ShariqYakubovich, AlexTripuraneni, NileshManyika, JamesQureshi, HaroonHua, NanNgani, ChristelRaad, Maria AbiForbes, HannahBulanova, AnnaStanway, JeffSundararajan, MukundUngureanu, VictorBishop, ColtonLi, YunjieVenkatraman, BalajiLi, BoThornton, ChloeScellato, SalvatoreGupta, NisheshWang, YichengTenney, IanWu, XihuiShenoy, AshishCarvajal, GabrielWright, Diana GageBariach, BenXiao, ZhuyunHawkins, PeterDalmia, SidFarabet, ClementValenzuela, PedroYuan, QuanWelty, ChrisAgarwal, AnanthChen, MiaKim, WooyeolHulse, BriceDukkipati, NanditaPaszke, AdamBolt, AndrewDavoodi, ElnazChoo, KiamBeattie, JenniferPrendki, JenniferVashisht, HarshaSantamaria-Fernandez, RebecaCobo, Luis C.Wilkiewicz, JarekMadras, DavidElqursh, AliUy, GrantRamirez, KevinHarvey, MattLiechty, TylerZen, HeigaSeibert, JeffHu, Clara HuiyiKhorlin, AndreyLe, MaigoAharoni, AsafLi, MeganWang, LilyKumar, SandeepLince, AlejandroCasagrande, NormanHoover, JayBadawy, Dalia ElSoergel, DavidVnukov, DenisMiecnikowski, MattSimsa, JiriKoop, AnnaKumar, PraveenSellam, ThibaultVlasic, DanielDaruki, SamiraShabat, NirZhang, JohnSu, GuolongZhang, JiagengLiu, JeremiahSun, YiPalmer, EvanGhaffarkhah, AlirezaXiong, XiCotruta, VictorFink, MichaelDixon, LucasSreevatsa, AshwinGoedeckemeyer, AdrianDimitriev, AlekJafari, MohsenCrocker, RemiFitzGerald, NicholasKumar, AviralGhemawat, SanjayPhilips, IvanLiu, FrederickLiang, YannieSterneck, RachelRepina, AlenaWu, MarcusKnight, LauraGeorgiev, MarinLee, HyoAskham, HarryChakladar, AbhishekLouis, AnnieCrous, CarlCate, HardiePetrova, DessieQuinn, MichaelOwusu-Afriyie, DeneseSinghal, AchintyaWei, NanKim, SolomonVincent, DamienNasr, MiladChoquette-Choo, Christopher A.Tojo, ReikoLu, ShawnCasas, Diego de LasCheng, YuchungBolukbasi, TolgaLee, KatherineFatehi, SaaberAnanthanarayanan, RajagopalPatel, MiteyanKaed, CharbelLi, JingSygnowski, JakubBelle, Shreyas RammohanChen, ZheKonzelmann, JaclynPõder, SiimGarg, RoopalKoverkathu, VinodBrown, AdamDyer, ChrisLiu, RosanneNova, AzadeXu, JunPetrov, SlavHassabis, DemisKavukcuoglu, KorayDean, JeffreyVinyals, Oriol
Source
Subject
Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Language
Abstract
In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks across modalities, improves the state-of-the-art in long-document QA, long-video QA and long-context ASR, and matches or surpasses Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5 Pro's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 2.1 (200k) and GPT-4 Turbo (128k). Finally, we highlight surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.