Audio plugin host https://kx.studio/carla
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

12270 lines
502KB

  1. /*
  2. FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
  3. dr_flac - v0.12.31 - 2021-08-16
  4. David Reid - mackron@gmail.com
  5. GitHub: https://github.com/mackron/dr_libs
  6. */
  7. /*
  8. RELEASE NOTES - v0.12.0
  9. =======================
  10. Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
  11. Improved Client-Defined Memory Allocation
  12. -----------------------------------------
  13. The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
  14. existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
  15. allocation callbacks are specified.
  16. To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
  17. void* my_malloc(size_t sz, void* pUserData)
  18. {
  19. return malloc(sz);
  20. }
  21. void* my_realloc(void* p, size_t sz, void* pUserData)
  22. {
  23. return realloc(p, sz);
  24. }
  25. void my_free(void* p, void* pUserData)
  26. {
  27. free(p);
  28. }
  29. ...
  30. drflac_allocation_callbacks allocationCallbacks;
  31. allocationCallbacks.pUserData = &myData;
  32. allocationCallbacks.onMalloc = my_malloc;
  33. allocationCallbacks.onRealloc = my_realloc;
  34. allocationCallbacks.onFree = my_free;
  35. drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);
  36. The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
  37. Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
  38. DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
  39. Every API that opens a drflac object now takes this extra parameter. These include the following:
  40. drflac_open()
  41. drflac_open_relaxed()
  42. drflac_open_with_metadata()
  43. drflac_open_with_metadata_relaxed()
  44. drflac_open_file()
  45. drflac_open_file_with_metadata()
  46. drflac_open_memory()
  47. drflac_open_memory_with_metadata()
  48. drflac_open_and_read_pcm_frames_s32()
  49. drflac_open_and_read_pcm_frames_s16()
  50. drflac_open_and_read_pcm_frames_f32()
  51. drflac_open_file_and_read_pcm_frames_s32()
  52. drflac_open_file_and_read_pcm_frames_s16()
  53. drflac_open_file_and_read_pcm_frames_f32()
  54. drflac_open_memory_and_read_pcm_frames_s32()
  55. drflac_open_memory_and_read_pcm_frames_s16()
  56. drflac_open_memory_and_read_pcm_frames_f32()
  57. Optimizations
  58. -------------
  59. Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
  60. improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
  61. advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
  62. means it will be disabled when DR_FLAC_NO_CRC is used.
  63. The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
  64. particular. 16-bit streams should also see some improvement.
  65. drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
  66. to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
  67. A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
  68. channel reconstruction which is the last part of the decoding process.
  69. The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
  70. compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
  71. compile time and the REV instruction requires ARM architecture version 6.
  72. An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
  73. Removed APIs
  74. ------------
  75. The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
  76. drflac_read_s32() -> drflac_read_pcm_frames_s32()
  77. drflac_read_s16() -> drflac_read_pcm_frames_s16()
  78. drflac_read_f32() -> drflac_read_pcm_frames_f32()
  79. drflac_seek_to_sample() -> drflac_seek_to_pcm_frame()
  80. drflac_open_and_decode_s32() -> drflac_open_and_read_pcm_frames_s32()
  81. drflac_open_and_decode_s16() -> drflac_open_and_read_pcm_frames_s16()
  82. drflac_open_and_decode_f32() -> drflac_open_and_read_pcm_frames_f32()
  83. drflac_open_and_decode_file_s32() -> drflac_open_file_and_read_pcm_frames_s32()
  84. drflac_open_and_decode_file_s16() -> drflac_open_file_and_read_pcm_frames_s16()
  85. drflac_open_and_decode_file_f32() -> drflac_open_file_and_read_pcm_frames_f32()
  86. drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
  87. drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
  88. drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
  89. Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
  90. to the old per-sample APIs. You now need to use the "pcm_frame" versions.
  91. */
  92. /*
  93. Introduction
  94. ============
  95. dr_flac is a single file library. To use it, do something like the following in one .c file.
  96. ```c
  97. #define DR_FLAC_IMPLEMENTATION
  98. #include "dr_flac.h"
  99. ```
  100. You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
  101. ```c
  102. drflac* pFlac = drflac_open_file("MySong.flac", NULL);
  103. if (pFlac == NULL) {
  104. // Failed to open FLAC file
  105. }
  106. drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
  107. drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
  108. ```
  109. The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
  110. should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
  111. a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
  112. You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
  113. samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
  114. ```c
  115. while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
  116. do_something();
  117. }
  118. ```
  119. You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
  120. If you just want to quickly decode an entire FLAC file in one go you can do something like this:
  121. ```c
  122. unsigned int channels;
  123. unsigned int sampleRate;
  124. drflac_uint64 totalPCMFrameCount;
  125. drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
  126. if (pSampleData == NULL) {
  127. // Failed to open and decode FLAC file.
  128. }
  129. ...
  130. drflac_free(pSampleData, NULL);
  131. ```
  132. You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
  133. should be considered lossy.
  134. If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
  135. The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
  136. reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
  137. The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
  138. streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
  139. `drflac_open_relaxed()`
  140. `drflac_open_with_metadata_relaxed()`
  141. It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
  142. APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
  143. Build Options
  144. =============
  145. #define these options before including this file.
  146. #define DR_FLAC_NO_STDIO
  147. Disable `drflac_open_file()` and family.
  148. #define DR_FLAC_NO_OGG
  149. Disables support for Ogg/FLAC streams.
  150. #define DR_FLAC_BUFFER_SIZE <number>
  151. Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
  152. Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
  153. you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
  154. #define DR_FLAC_NO_CRC
  155. Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
  156. be used if available. Otherwise the seek will be performed using brute force.
  157. #define DR_FLAC_NO_SIMD
  158. Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
  159. Notes
  160. =====
  161. - dr_flac does not support changing the sample rate nor channel count mid stream.
  162. - dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
  163. - When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
  164. to differences in corrupted stream recorvery logic between the two APIs.
  165. */
  166. #ifndef dr_flac_h
  167. #define dr_flac_h
  168. #ifdef __cplusplus
  169. extern "C" {
  170. #endif
  171. #define DRFLAC_STRINGIFY(x) #x
  172. #define DRFLAC_XSTRINGIFY(x) DRFLAC_STRINGIFY(x)
  173. #define DRFLAC_VERSION_MAJOR 0
  174. #define DRFLAC_VERSION_MINOR 12
  175. #define DRFLAC_VERSION_REVISION 31
  176. #define DRFLAC_VERSION_STRING DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
  177. #include <stddef.h> /* For size_t. */
  178. /* Sized types. */
  179. typedef signed char drflac_int8;
  180. typedef unsigned char drflac_uint8;
  181. typedef signed short drflac_int16;
  182. typedef unsigned short drflac_uint16;
  183. typedef signed int drflac_int32;
  184. typedef unsigned int drflac_uint32;
  185. #if defined(_MSC_VER)
  186. typedef signed __int64 drflac_int64;
  187. typedef unsigned __int64 drflac_uint64;
  188. #else
  189. #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
  190. #pragma GCC diagnostic push
  191. #pragma GCC diagnostic ignored "-Wlong-long"
  192. #if defined(__clang__)
  193. #pragma GCC diagnostic ignored "-Wc++11-long-long"
  194. #endif
  195. #endif
  196. typedef signed long long drflac_int64;
  197. typedef unsigned long long drflac_uint64;
  198. #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
  199. #pragma GCC diagnostic pop
  200. #endif
  201. #endif
  202. #if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined (_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
  203. typedef drflac_uint64 drflac_uintptr;
  204. #else
  205. typedef drflac_uint32 drflac_uintptr;
  206. #endif
  207. typedef drflac_uint8 drflac_bool8;
  208. typedef drflac_uint32 drflac_bool32;
  209. #define DRFLAC_TRUE 1
  210. #define DRFLAC_FALSE 0
  211. #if !defined(DRFLAC_API)
  212. #if defined(DRFLAC_DLL)
  213. #if defined(_WIN32)
  214. #define DRFLAC_DLL_IMPORT __declspec(dllimport)
  215. #define DRFLAC_DLL_EXPORT __declspec(dllexport)
  216. #define DRFLAC_DLL_PRIVATE static
  217. #else
  218. #if defined(__GNUC__) && __GNUC__ >= 4
  219. #define DRFLAC_DLL_IMPORT __attribute__((visibility("default")))
  220. #define DRFLAC_DLL_EXPORT __attribute__((visibility("default")))
  221. #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
  222. #else
  223. #define DRFLAC_DLL_IMPORT
  224. #define DRFLAC_DLL_EXPORT
  225. #define DRFLAC_DLL_PRIVATE static
  226. #endif
  227. #endif
  228. #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
  229. #define DRFLAC_API DRFLAC_DLL_EXPORT
  230. #else
  231. #define DRFLAC_API DRFLAC_DLL_IMPORT
  232. #endif
  233. #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
  234. #else
  235. #define DRFLAC_API extern
  236. #define DRFLAC_PRIVATE static
  237. #endif
  238. #endif
  239. #if defined(_MSC_VER) && _MSC_VER >= 1700 /* Visual Studio 2012 */
  240. #define DRFLAC_DEPRECATED __declspec(deprecated)
  241. #elif (defined(__GNUC__) && __GNUC__ >= 4) /* GCC 4 */
  242. #define DRFLAC_DEPRECATED __attribute__((deprecated))
  243. #elif defined(__has_feature) /* Clang */
  244. #if __has_feature(attribute_deprecated)
  245. #define DRFLAC_DEPRECATED __attribute__((deprecated))
  246. #else
  247. #define DRFLAC_DEPRECATED
  248. #endif
  249. #else
  250. #define DRFLAC_DEPRECATED
  251. #endif
  252. DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
  253. DRFLAC_API const char* drflac_version_string(void);
  254. /*
  255. As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
  256. but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
  257. */
  258. #ifndef DR_FLAC_BUFFER_SIZE
  259. #define DR_FLAC_BUFFER_SIZE 4096
  260. #endif
  261. /* Check if we can enable 64-bit optimizations. */
  262. #if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
  263. #define DRFLAC_64BIT
  264. #endif
  265. #ifdef DRFLAC_64BIT
  266. typedef drflac_uint64 drflac_cache_t;
  267. #else
  268. typedef drflac_uint32 drflac_cache_t;
  269. #endif
  270. /* The various metadata block types. */
  271. #define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO 0
  272. #define DRFLAC_METADATA_BLOCK_TYPE_PADDING 1
  273. #define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION 2
  274. #define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE 3
  275. #define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT 4
  276. #define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET 5
  277. #define DRFLAC_METADATA_BLOCK_TYPE_PICTURE 6
  278. #define DRFLAC_METADATA_BLOCK_TYPE_INVALID 127
  279. /* The various picture types specified in the PICTURE block. */
  280. #define DRFLAC_PICTURE_TYPE_OTHER 0
  281. #define DRFLAC_PICTURE_TYPE_FILE_ICON 1
  282. #define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON 2
  283. #define DRFLAC_PICTURE_TYPE_COVER_FRONT 3
  284. #define DRFLAC_PICTURE_TYPE_COVER_BACK 4
  285. #define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE 5
  286. #define DRFLAC_PICTURE_TYPE_MEDIA 6
  287. #define DRFLAC_PICTURE_TYPE_LEAD_ARTIST 7
  288. #define DRFLAC_PICTURE_TYPE_ARTIST 8
  289. #define DRFLAC_PICTURE_TYPE_CONDUCTOR 9
  290. #define DRFLAC_PICTURE_TYPE_BAND 10
  291. #define DRFLAC_PICTURE_TYPE_COMPOSER 11
  292. #define DRFLAC_PICTURE_TYPE_LYRICIST 12
  293. #define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION 13
  294. #define DRFLAC_PICTURE_TYPE_DURING_RECORDING 14
  295. #define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE 15
  296. #define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE 16
  297. #define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH 17
  298. #define DRFLAC_PICTURE_TYPE_ILLUSTRATION 18
  299. #define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE 19
  300. #define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE 20
  301. typedef enum
  302. {
  303. drflac_container_native,
  304. drflac_container_ogg,
  305. drflac_container_unknown
  306. } drflac_container;
  307. typedef enum
  308. {
  309. drflac_seek_origin_start,
  310. drflac_seek_origin_current
  311. } drflac_seek_origin;
  312. /* Packing is important on this structure because we map this directly to the raw data within the SEEKTABLE metadata block. */
  313. #pragma pack(2)
  314. typedef struct
  315. {
  316. drflac_uint64 firstPCMFrame;
  317. drflac_uint64 flacFrameOffset; /* The offset from the first byte of the header of the first frame. */
  318. drflac_uint16 pcmFrameCount;
  319. } drflac_seekpoint;
  320. #pragma pack()
  321. typedef struct
  322. {
  323. drflac_uint16 minBlockSizeInPCMFrames;
  324. drflac_uint16 maxBlockSizeInPCMFrames;
  325. drflac_uint32 minFrameSizeInPCMFrames;
  326. drflac_uint32 maxFrameSizeInPCMFrames;
  327. drflac_uint32 sampleRate;
  328. drflac_uint8 channels;
  329. drflac_uint8 bitsPerSample;
  330. drflac_uint64 totalPCMFrameCount;
  331. drflac_uint8 md5[16];
  332. } drflac_streaminfo;
  333. typedef struct
  334. {
  335. /*
  336. The metadata type. Use this to know how to interpret the data below. Will be set to one of the
  337. DRFLAC_METADATA_BLOCK_TYPE_* tokens.
  338. */
  339. drflac_uint32 type;
  340. /*
  341. A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
  342. not modify the contents of this buffer. Use the structures below for more meaningful and structured
  343. information about the metadata. It's possible for this to be null.
  344. */
  345. const void* pRawData;
  346. /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
  347. drflac_uint32 rawDataSize;
  348. union
  349. {
  350. drflac_streaminfo streaminfo;
  351. struct
  352. {
  353. int unused;
  354. } padding;
  355. struct
  356. {
  357. drflac_uint32 id;
  358. const void* pData;
  359. drflac_uint32 dataSize;
  360. } application;
  361. struct
  362. {
  363. drflac_uint32 seekpointCount;
  364. const drflac_seekpoint* pSeekpoints;
  365. } seektable;
  366. struct
  367. {
  368. drflac_uint32 vendorLength;
  369. const char* vendor;
  370. drflac_uint32 commentCount;
  371. const void* pComments;
  372. } vorbis_comment;
  373. struct
  374. {
  375. char catalog[128];
  376. drflac_uint64 leadInSampleCount;
  377. drflac_bool32 isCD;
  378. drflac_uint8 trackCount;
  379. const void* pTrackData;
  380. } cuesheet;
  381. struct
  382. {
  383. drflac_uint32 type;
  384. drflac_uint32 mimeLength;
  385. const char* mime;
  386. drflac_uint32 descriptionLength;
  387. const char* description;
  388. drflac_uint32 width;
  389. drflac_uint32 height;
  390. drflac_uint32 colorDepth;
  391. drflac_uint32 indexColorCount;
  392. drflac_uint32 pictureDataSize;
  393. const drflac_uint8* pPictureData;
  394. } picture;
  395. } data;
  396. } drflac_metadata;
  397. /*
  398. Callback for when data needs to be read from the client.
  399. Parameters
  400. ----------
  401. pUserData (in)
  402. The user data that was passed to drflac_open() and family.
  403. pBufferOut (out)
  404. The output buffer.
  405. bytesToRead (in)
  406. The number of bytes to read.
  407. Return Value
  408. ------------
  409. The number of bytes actually read.
  410. Remarks
  411. -------
  412. A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
  413. you have reached the end of the stream.
  414. */
  415. typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
  416. /*
  417. Callback for when data needs to be seeked.
  418. Parameters
  419. ----------
  420. pUserData (in)
  421. The user data that was passed to drflac_open() and family.
  422. offset (in)
  423. The number of bytes to move, relative to the origin. Will never be negative.
  424. origin (in)
  425. The origin of the seek - the current position or the start of the stream.
  426. Return Value
  427. ------------
  428. Whether or not the seek was successful.
  429. Remarks
  430. -------
  431. The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
  432. either drflac_seek_origin_start or drflac_seek_origin_current.
  433. When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
  434. and handled by returning DRFLAC_FALSE.
  435. */
  436. typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
  437. /*
  438. Callback for when a metadata block is read.
  439. Parameters
  440. ----------
  441. pUserData (in)
  442. The user data that was passed to drflac_open() and family.
  443. pMetadata (in)
  444. A pointer to a structure containing the data of the metadata block.
  445. Remarks
  446. -------
  447. Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
  448. will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
  449. */
  450. typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
  451. typedef struct
  452. {
  453. void* pUserData;
  454. void* (* onMalloc)(size_t sz, void* pUserData);
  455. void* (* onRealloc)(void* p, size_t sz, void* pUserData);
  456. void (* onFree)(void* p, void* pUserData);
  457. } drflac_allocation_callbacks;
  458. /* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
  459. typedef struct
  460. {
  461. const drflac_uint8* data;
  462. size_t dataSize;
  463. size_t currentReadPos;
  464. } drflac__memory_stream;
  465. /* Structure for internal use. Used for bit streaming. */
  466. typedef struct
  467. {
  468. /* The function to call when more data needs to be read. */
  469. drflac_read_proc onRead;
  470. /* The function to call when the current read position needs to be moved. */
  471. drflac_seek_proc onSeek;
  472. /* The user data to pass around to onRead and onSeek. */
  473. void* pUserData;
  474. /*
  475. The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
  476. stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
  477. or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
  478. */
  479. size_t unalignedByteCount;
  480. /* The content of the unaligned bytes. */
  481. drflac_cache_t unalignedCache;
  482. /* The index of the next valid cache line in the "L2" cache. */
  483. drflac_uint32 nextL2Line;
  484. /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
  485. drflac_uint32 consumedBits;
  486. /*
  487. The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
  488. Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
  489. */
  490. drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
  491. drflac_cache_t cache;
  492. /*
  493. CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
  494. is reset to 0 at the beginning of each frame.
  495. */
  496. drflac_uint16 crc16;
  497. drflac_cache_t crc16Cache; /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
  498. drflac_uint32 crc16CacheIgnoredBytes; /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
  499. } drflac_bs;
  500. typedef struct
  501. {
  502. /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
  503. drflac_uint8 subframeType;
  504. /* The number of wasted bits per sample as specified by the sub-frame header. */
  505. drflac_uint8 wastedBitsPerSample;
  506. /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
  507. drflac_uint8 lpcOrder;
  508. /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
  509. drflac_int32* pSamplesS32;
  510. } drflac_subframe;
  511. typedef struct
  512. {
  513. /*
  514. If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
  515. always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
  516. */
  517. drflac_uint64 pcmFrameNumber;
  518. /*
  519. If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
  520. is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
  521. */
  522. drflac_uint32 flacFrameNumber;
  523. /* The sample rate of this frame. */
  524. drflac_uint32 sampleRate;
  525. /* The number of PCM frames in each sub-frame within this frame. */
  526. drflac_uint16 blockSizeInPCMFrames;
  527. /*
  528. The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
  529. will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
  530. */
  531. drflac_uint8 channelAssignment;
  532. /* The number of bits per sample within this frame. */
  533. drflac_uint8 bitsPerSample;
  534. /* The frame's CRC. */
  535. drflac_uint8 crc8;
  536. } drflac_frame_header;
  537. typedef struct
  538. {
  539. /* The header. */
  540. drflac_frame_header header;
  541. /*
  542. The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
  543. this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
  544. */
  545. drflac_uint32 pcmFramesRemaining;
  546. /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
  547. drflac_subframe subframes[8];
  548. } drflac_frame;
  549. typedef struct
  550. {
  551. /* The function to call when a metadata block is read. */
  552. drflac_meta_proc onMeta;
  553. /* The user data posted to the metadata callback function. */
  554. void* pUserDataMD;
  555. /* Memory allocation callbacks. */
  556. drflac_allocation_callbacks allocationCallbacks;
  557. /* The sample rate. Will be set to something like 44100. */
  558. drflac_uint32 sampleRate;
  559. /*
  560. The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
  561. value specified in the STREAMINFO block.
  562. */
  563. drflac_uint8 channels;
  564. /* The bits per sample. Will be set to something like 16, 24, etc. */
  565. drflac_uint8 bitsPerSample;
  566. /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
  567. drflac_uint16 maxBlockSizeInPCMFrames;
  568. /*
  569. The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
  570. the total PCM frame count is unknown. Likely the case with streams like internet radio.
  571. */
  572. drflac_uint64 totalPCMFrameCount;
  573. /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
  574. drflac_container container;
  575. /* The number of seekpoints in the seektable. */
  576. drflac_uint32 seekpointCount;
  577. /* Information about the frame the decoder is currently sitting on. */
  578. drflac_frame currentFLACFrame;
  579. /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
  580. drflac_uint64 currentPCMFrame;
  581. /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
  582. drflac_uint64 firstFLACFramePosInBytes;
  583. /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
  584. drflac__memory_stream memoryStream;
  585. /* A pointer to the decoded sample data. This is an offset of pExtraData. */
  586. drflac_int32* pDecodedSamples;
  587. /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
  588. drflac_seekpoint* pSeekpoints;
  589. /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
  590. void* _oggbs;
  591. /* Internal use only. Used for profiling and testing different seeking modes. */
  592. drflac_bool32 _noSeekTableSeek : 1;
  593. drflac_bool32 _noBinarySearchSeek : 1;
  594. drflac_bool32 _noBruteForceSeek : 1;
  595. /* The bit streamer. The raw FLAC data is fed through this object. */
  596. drflac_bs bs;
  597. /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
  598. drflac_uint8 pExtraData[1];
  599. } drflac;
  600. /*
  601. Opens a FLAC decoder.
  602. Parameters
  603. ----------
  604. onRead (in)
  605. The function to call when data needs to be read from the client.
  606. onSeek (in)
  607. The function to call when the read position of the client data needs to move.
  608. pUserData (in, optional)
  609. A pointer to application defined data that will be passed to onRead and onSeek.
  610. pAllocationCallbacks (in, optional)
  611. A pointer to application defined callbacks for managing memory allocations.
  612. Return Value
  613. ------------
  614. Returns a pointer to an object representing the decoder.
  615. Remarks
  616. -------
  617. Close the decoder with `drflac_close()`.
  618. `pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
  619. This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
  620. without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
  621. This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
  622. from a block of memory respectively.
  623. The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
  624. Use `drflac_open_with_metadata()` if you need access to metadata.
  625. Seek Also
  626. ---------
  627. drflac_open_file()
  628. drflac_open_memory()
  629. drflac_open_with_metadata()
  630. drflac_close()
  631. */
  632. DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
  633. /*
  634. Opens a FLAC stream with relaxed validation of the header block.
  635. Parameters
  636. ----------
  637. onRead (in)
  638. The function to call when data needs to be read from the client.
  639. onSeek (in)
  640. The function to call when the read position of the client data needs to move.
  641. container (in)
  642. Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
  643. pUserData (in, optional)
  644. A pointer to application defined data that will be passed to onRead and onSeek.
  645. pAllocationCallbacks (in, optional)
  646. A pointer to application defined callbacks for managing memory allocations.
  647. Return Value
  648. ------------
  649. A pointer to an object representing the decoder.
  650. Remarks
  651. -------
  652. The same as drflac_open(), except attempts to open the stream even when a header block is not present.
  653. Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
  654. as that is for internal use only.
  655. Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
  656. force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
  657. Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
  658. */
  659. DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
  660. /*
  661. Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
  662. Parameters
  663. ----------
  664. onRead (in)
  665. The function to call when data needs to be read from the client.
  666. onSeek (in)
  667. The function to call when the read position of the client data needs to move.
  668. onMeta (in)
  669. The function to call for every metadata block.
  670. pUserData (in, optional)
  671. A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
  672. pAllocationCallbacks (in, optional)
  673. A pointer to application defined callbacks for managing memory allocations.
  674. Return Value
  675. ------------
  676. A pointer to an object representing the decoder.
  677. Remarks
  678. -------
  679. Close the decoder with `drflac_close()`.
  680. `pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
  681. This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
  682. metadata block except for STREAMINFO and PADDING blocks.
  683. The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
  684. pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
  685. the different metadata types.
  686. The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
  687. Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
  688. the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
  689. metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
  690. returned depending on whether or not the stream is being opened with metadata.
  691. Seek Also
  692. ---------
  693. drflac_open_file_with_metadata()
  694. drflac_open_memory_with_metadata()
  695. drflac_open()
  696. drflac_close()
  697. */
  698. DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
  699. /*
  700. The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
  701. See Also
  702. --------
  703. drflac_open_with_metadata()
  704. drflac_open_relaxed()
  705. */
  706. DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
  707. /*
  708. Closes the given FLAC decoder.
  709. Parameters
  710. ----------
  711. pFlac (in)
  712. The decoder to close.
  713. Remarks
  714. -------
  715. This will destroy the decoder object.
  716. See Also
  717. --------
  718. drflac_open()
  719. drflac_open_with_metadata()
  720. drflac_open_file()
  721. drflac_open_file_w()
  722. drflac_open_file_with_metadata()
  723. drflac_open_file_with_metadata_w()
  724. drflac_open_memory()
  725. drflac_open_memory_with_metadata()
  726. */
  727. DRFLAC_API void drflac_close(drflac* pFlac);
  728. /*
  729. Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
  730. Parameters
  731. ----------
  732. pFlac (in)
  733. The decoder.
  734. framesToRead (in)
  735. The number of PCM frames to read.
  736. pBufferOut (out, optional)
  737. A pointer to the buffer that will receive the decoded samples.
  738. Return Value
  739. ------------
  740. Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
  741. Remarks
  742. -------
  743. pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
  744. */
  745. DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
  746. /*
  747. Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
  748. Parameters
  749. ----------
  750. pFlac (in)
  751. The decoder.
  752. framesToRead (in)
  753. The number of PCM frames to read.
  754. pBufferOut (out, optional)
  755. A pointer to the buffer that will receive the decoded samples.
  756. Return Value
  757. ------------
  758. Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
  759. Remarks
  760. -------
  761. pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
  762. Note that this is lossy for streams where the bits per sample is larger than 16.
  763. */
  764. DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
  765. /*
  766. Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
  767. Parameters
  768. ----------
  769. pFlac (in)
  770. The decoder.
  771. framesToRead (in)
  772. The number of PCM frames to read.
  773. pBufferOut (out, optional)
  774. A pointer to the buffer that will receive the decoded samples.
  775. Return Value
  776. ------------
  777. Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
  778. Remarks
  779. -------
  780. pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
  781. Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
  782. */
  783. DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
  784. /*
  785. Seeks to the PCM frame at the given index.
  786. Parameters
  787. ----------
  788. pFlac (in)
  789. The decoder.
  790. pcmFrameIndex (in)
  791. The index of the PCM frame to seek to. See notes below.
  792. Return Value
  793. -------------
  794. `DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
  795. */
  796. DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
  797. #ifndef DR_FLAC_NO_STDIO
  798. /*
  799. Opens a FLAC decoder from the file at the given path.
  800. Parameters
  801. ----------
  802. pFileName (in)
  803. The path of the file to open, either absolute or relative to the current directory.
  804. pAllocationCallbacks (in, optional)
  805. A pointer to application defined callbacks for managing memory allocations.
  806. Return Value
  807. ------------
  808. A pointer to an object representing the decoder.
  809. Remarks
  810. -------
  811. Close the decoder with drflac_close().
  812. Remarks
  813. -------
  814. This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
  815. at any given time, so keep this mind if you have many decoders open at the same time.
  816. See Also
  817. --------
  818. drflac_open_file_with_metadata()
  819. drflac_open()
  820. drflac_close()
  821. */
  822. DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
  823. DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
  824. /*
  825. Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
  826. Parameters
  827. ----------
  828. pFileName (in)
  829. The path of the file to open, either absolute or relative to the current directory.
  830. pAllocationCallbacks (in, optional)
  831. A pointer to application defined callbacks for managing memory allocations.
  832. onMeta (in)
  833. The callback to fire for each metadata block.
  834. pUserData (in)
  835. A pointer to the user data to pass to the metadata callback.
  836. pAllocationCallbacks (in)
  837. A pointer to application defined callbacks for managing memory allocations.
  838. Remarks
  839. -------
  840. Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
  841. See Also
  842. --------
  843. drflac_open_with_metadata()
  844. drflac_open()
  845. drflac_close()
  846. */
  847. DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
  848. DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
  849. #endif
  850. /*
  851. Opens a FLAC decoder from a pre-allocated block of memory
  852. Parameters
  853. ----------
  854. pData (in)
  855. A pointer to the raw encoded FLAC data.
  856. dataSize (in)
  857. The size in bytes of `data`.
  858. pAllocationCallbacks (in)
  859. A pointer to application defined callbacks for managing memory allocations.
  860. Return Value
  861. ------------
  862. A pointer to an object representing the decoder.
  863. Remarks
  864. -------
  865. This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
  866. See Also
  867. --------
  868. drflac_open()
  869. drflac_close()
  870. */
  871. DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
  872. /*
  873. Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
  874. Parameters
  875. ----------
  876. pData (in)
  877. A pointer to the raw encoded FLAC data.
  878. dataSize (in)
  879. The size in bytes of `data`.
  880. onMeta (in)
  881. The callback to fire for each metadata block.
  882. pUserData (in)
  883. A pointer to the user data to pass to the metadata callback.
  884. pAllocationCallbacks (in)
  885. A pointer to application defined callbacks for managing memory allocations.
  886. Remarks
  887. -------
  888. Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
  889. See Also
  890. -------
  891. drflac_open_with_metadata()
  892. drflac_open()
  893. drflac_close()
  894. */
  895. DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
  896. /* High Level APIs */
  897. /*
  898. Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
  899. pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
  900. You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
  901. case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
  902. Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
  903. read samples into a dynamically sized buffer on the heap until no samples are left.
  904. Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
  905. */
  906. DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
  907. /* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
  908. DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
  909. /* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
  910. DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
  911. #ifndef DR_FLAC_NO_STDIO
  912. /* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
  913. DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
  914. /* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
  915. DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
  916. /* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
  917. DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
  918. #endif
  919. /* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
  920. DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
  921. /* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
  922. DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
  923. /* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
  924. DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
  925. /*
  926. Frees memory that was allocated internally by dr_flac.
  927. Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
  928. */
  929. DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
  930. /* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
  931. typedef struct
  932. {
  933. drflac_uint32 countRemaining;
  934. const char* pRunningData;
  935. } drflac_vorbis_comment_iterator;
  936. /*
  937. Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
  938. metadata block.
  939. */
  940. DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
  941. /*
  942. Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
  943. returned string is NOT null terminated.
  944. */
  945. DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
  946. /* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
  947. typedef struct
  948. {
  949. drflac_uint32 countRemaining;
  950. const char* pRunningData;
  951. } drflac_cuesheet_track_iterator;
  952. /* Packing is important on this structure because we map this directly to the raw data within the CUESHEET metadata block. */
  953. #pragma pack(4)
  954. typedef struct
  955. {
  956. drflac_uint64 offset;
  957. drflac_uint8 index;
  958. drflac_uint8 reserved[3];
  959. } drflac_cuesheet_track_index;
  960. #pragma pack()
  961. typedef struct
  962. {
  963. drflac_uint64 offset;
  964. drflac_uint8 trackNumber;
  965. char ISRC[12];
  966. drflac_bool8 isAudio;
  967. drflac_bool8 preEmphasis;
  968. drflac_uint8 indexCount;
  969. const drflac_cuesheet_track_index* pIndexPoints;
  970. } drflac_cuesheet_track;
  971. /*
  972. Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
  973. block.
  974. */
  975. DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
  976. /* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
  977. DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
  978. #ifdef __cplusplus
  979. }
  980. #endif
  981. #endif /* dr_flac_h */
  982. /************************************************************************************************************************************************************
  983. ************************************************************************************************************************************************************
  984. IMPLEMENTATION
  985. ************************************************************************************************************************************************************
  986. ************************************************************************************************************************************************************/
  987. #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
  988. #ifndef dr_flac_c
  989. #define dr_flac_c
  990. /* Disable some annoying warnings. */
  991. #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
  992. #pragma GCC diagnostic push
  993. #if __GNUC__ >= 7
  994. #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
  995. #endif
  996. #endif
  997. #ifdef __linux__
  998. #ifndef _BSD_SOURCE
  999. #define _BSD_SOURCE
  1000. #endif
  1001. #ifndef _DEFAULT_SOURCE
  1002. #define _DEFAULT_SOURCE
  1003. #endif
  1004. #ifndef __USE_BSD
  1005. #define __USE_BSD
  1006. #endif
  1007. #include <endian.h>
  1008. #endif
  1009. #include <stdlib.h>
  1010. #include <string.h>
  1011. #ifdef _MSC_VER
  1012. #define DRFLAC_INLINE __forceinline
  1013. #elif defined(__GNUC__)
  1014. /*
  1015. I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
  1016. the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
  1017. case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
  1018. command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
  1019. I am using "__inline__" only when we're compiling in strict ANSI mode.
  1020. */
  1021. #if defined(__STRICT_ANSI__)
  1022. #define DRFLAC_INLINE __inline__ __attribute__((always_inline))
  1023. #else
  1024. #define DRFLAC_INLINE inline __attribute__((always_inline))
  1025. #endif
  1026. #elif defined(__WATCOMC__)
  1027. #define DRFLAC_INLINE __inline
  1028. #else
  1029. #define DRFLAC_INLINE
  1030. #endif
  1031. /* CPU architecture. */
  1032. #if defined(__x86_64__) || defined(_M_X64)
  1033. #define DRFLAC_X64
  1034. #elif defined(__i386) || defined(_M_IX86)
  1035. #define DRFLAC_X86
  1036. #elif defined(__arm__) || defined(_M_ARM) || defined(_M_ARM64)
  1037. #define DRFLAC_ARM
  1038. #endif
  1039. /*
  1040. Intrinsics Support
  1041. There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
  1042. "error: shift must be an immediate"
  1043. Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
  1044. */
  1045. #if !defined(DR_FLAC_NO_SIMD)
  1046. #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
  1047. #if defined(_MSC_VER) && !defined(__clang__)
  1048. /* MSVC. */
  1049. #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2) /* 2005 */
  1050. #define DRFLAC_SUPPORT_SSE2
  1051. #endif
  1052. #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41) /* 2010 */
  1053. #define DRFLAC_SUPPORT_SSE41
  1054. #endif
  1055. #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
  1056. /* Assume GNUC-style. */
  1057. #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
  1058. #define DRFLAC_SUPPORT_SSE2
  1059. #endif
  1060. #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
  1061. #define DRFLAC_SUPPORT_SSE41
  1062. #endif
  1063. #endif
  1064. /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
  1065. #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
  1066. #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
  1067. #define DRFLAC_SUPPORT_SSE2
  1068. #endif
  1069. #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
  1070. #define DRFLAC_SUPPORT_SSE41
  1071. #endif
  1072. #endif
  1073. #if defined(DRFLAC_SUPPORT_SSE41)
  1074. #include <smmintrin.h>
  1075. #elif defined(DRFLAC_SUPPORT_SSE2)
  1076. #include <emmintrin.h>
  1077. #endif
  1078. #endif
  1079. #if defined(DRFLAC_ARM)
  1080. #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
  1081. #define DRFLAC_SUPPORT_NEON
  1082. #endif
  1083. /* Fall back to looking for the #include file. */
  1084. #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
  1085. #if !defined(DRFLAC_SUPPORT_NEON) && !defined(DRFLAC_NO_NEON) && __has_include(<arm_neon.h>)
  1086. #define DRFLAC_SUPPORT_NEON
  1087. #endif
  1088. #endif
  1089. #if defined(DRFLAC_SUPPORT_NEON)
  1090. #include <arm_neon.h>
  1091. #endif
  1092. #endif
  1093. #endif
  1094. /* Compile-time CPU feature support. */
  1095. #if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
  1096. #if defined(_MSC_VER) && !defined(__clang__)
  1097. #if _MSC_VER >= 1400
  1098. #include <intrin.h>
  1099. static void drflac__cpuid(int info[4], int fid)
  1100. {
  1101. __cpuid(info, fid);
  1102. }
  1103. #else
  1104. #define DRFLAC_NO_CPUID
  1105. #endif
  1106. #else
  1107. #if defined(__GNUC__) || defined(__clang__)
  1108. static void drflac__cpuid(int info[4], int fid)
  1109. {
  1110. /*
  1111. It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
  1112. specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
  1113. supporting different assembly dialects.
  1114. What's basically happening is that we're saving and restoring the ebx register manually.
  1115. */
  1116. #if defined(DRFLAC_X86) && defined(__PIC__)
  1117. __asm__ __volatile__ (
  1118. "xchg{l} {%%}ebx, %k1;"
  1119. "cpuid;"
  1120. "xchg{l} {%%}ebx, %k1;"
  1121. : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
  1122. );
  1123. #else
  1124. __asm__ __volatile__ (
  1125. "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
  1126. );
  1127. #endif
  1128. }
  1129. #else
  1130. #define DRFLAC_NO_CPUID
  1131. #endif
  1132. #endif
  1133. #else
  1134. #define DRFLAC_NO_CPUID
  1135. #endif
  1136. static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
  1137. {
  1138. #if defined(DRFLAC_SUPPORT_SSE2)
  1139. #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
  1140. #if defined(DRFLAC_X64)
  1141. return DRFLAC_TRUE; /* 64-bit targets always support SSE2. */
  1142. #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
  1143. return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
  1144. #else
  1145. #if defined(DRFLAC_NO_CPUID)
  1146. return DRFLAC_FALSE;
  1147. #else
  1148. int info[4];
  1149. drflac__cpuid(info, 1);
  1150. return (info[3] & (1 << 26)) != 0;
  1151. #endif
  1152. #endif
  1153. #else
  1154. return DRFLAC_FALSE; /* SSE2 is only supported on x86 and x64 architectures. */
  1155. #endif
  1156. #else
  1157. return DRFLAC_FALSE; /* No compiler support. */
  1158. #endif
  1159. }
  1160. static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
  1161. {
  1162. #if defined(DRFLAC_SUPPORT_SSE41)
  1163. #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
  1164. #if defined(DRFLAC_X64)
  1165. return DRFLAC_TRUE; /* 64-bit targets always support SSE4.1. */
  1166. #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE4_1__)
  1167. return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
  1168. #else
  1169. #if defined(DRFLAC_NO_CPUID)
  1170. return DRFLAC_FALSE;
  1171. #else
  1172. int info[4];
  1173. drflac__cpuid(info, 1);
  1174. return (info[2] & (1 << 19)) != 0;
  1175. #endif
  1176. #endif
  1177. #else
  1178. return DRFLAC_FALSE; /* SSE41 is only supported on x86 and x64 architectures. */
  1179. #endif
  1180. #else
  1181. return DRFLAC_FALSE; /* No compiler support. */
  1182. #endif
  1183. }
  1184. #if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
  1185. #define DRFLAC_HAS_LZCNT_INTRINSIC
  1186. #elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
  1187. #define DRFLAC_HAS_LZCNT_INTRINSIC
  1188. #elif defined(__clang__)
  1189. #if defined(__has_builtin)
  1190. #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
  1191. #define DRFLAC_HAS_LZCNT_INTRINSIC
  1192. #endif
  1193. #endif
  1194. #endif
  1195. #if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
  1196. #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
  1197. #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
  1198. #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
  1199. #elif defined(__clang__)
  1200. #if defined(__has_builtin)
  1201. #if __has_builtin(__builtin_bswap16)
  1202. #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
  1203. #endif
  1204. #if __has_builtin(__builtin_bswap32)
  1205. #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
  1206. #endif
  1207. #if __has_builtin(__builtin_bswap64)
  1208. #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
  1209. #endif
  1210. #endif
  1211. #elif defined(__GNUC__)
  1212. #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
  1213. #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
  1214. #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
  1215. #endif
  1216. #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
  1217. #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
  1218. #endif
  1219. #elif defined(__WATCOMC__) && defined(__386__)
  1220. #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
  1221. #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
  1222. #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
  1223. extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
  1224. extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
  1225. extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
  1226. #pragma aux _watcom_bswap16 = \
  1227. "xchg al, ah" \
  1228. parm [ax] \
  1229. modify [ax];
  1230. #pragma aux _watcom_bswap32 = \
  1231. "bswap eax" \
  1232. parm [eax] \
  1233. modify [eax];
  1234. #pragma aux _watcom_bswap64 = \
  1235. "bswap eax" \
  1236. "bswap edx" \
  1237. "xchg eax,edx" \
  1238. parm [eax edx] \
  1239. modify [eax edx];
  1240. #endif
  1241. /* Standard library stuff. */
  1242. #ifndef DRFLAC_ASSERT
  1243. #include <assert.h>
  1244. #define DRFLAC_ASSERT(expression) assert(expression)
  1245. #endif
  1246. #ifndef DRFLAC_MALLOC
  1247. #define DRFLAC_MALLOC(sz) malloc((sz))
  1248. #endif
  1249. #ifndef DRFLAC_REALLOC
  1250. #define DRFLAC_REALLOC(p, sz) realloc((p), (sz))
  1251. #endif
  1252. #ifndef DRFLAC_FREE
  1253. #define DRFLAC_FREE(p) free((p))
  1254. #endif
  1255. #ifndef DRFLAC_COPY_MEMORY
  1256. #define DRFLAC_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz))
  1257. #endif
  1258. #ifndef DRFLAC_ZERO_MEMORY
  1259. #define DRFLAC_ZERO_MEMORY(p, sz) memset((p), 0, (sz))
  1260. #endif
  1261. #ifndef DRFLAC_ZERO_OBJECT
  1262. #define DRFLAC_ZERO_OBJECT(p) DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
  1263. #endif
  1264. #define DRFLAC_MAX_SIMD_VECTOR_SIZE 64 /* 64 for AVX-512 in the future. */
  1265. typedef drflac_int32 drflac_result;
  1266. #define DRFLAC_SUCCESS 0
  1267. #define DRFLAC_ERROR -1 /* A generic error. */
  1268. #define DRFLAC_INVALID_ARGS -2
  1269. #define DRFLAC_INVALID_OPERATION -3
  1270. #define DRFLAC_OUT_OF_MEMORY -4
  1271. #define DRFLAC_OUT_OF_RANGE -5
  1272. #define DRFLAC_ACCESS_DENIED -6
  1273. #define DRFLAC_DOES_NOT_EXIST -7
  1274. #define DRFLAC_ALREADY_EXISTS -8
  1275. #define DRFLAC_TOO_MANY_OPEN_FILES -9
  1276. #define DRFLAC_INVALID_FILE -10
  1277. #define DRFLAC_TOO_BIG -11
  1278. #define DRFLAC_PATH_TOO_LONG -12
  1279. #define DRFLAC_NAME_TOO_LONG -13
  1280. #define DRFLAC_NOT_DIRECTORY -14
  1281. #define DRFLAC_IS_DIRECTORY -15
  1282. #define DRFLAC_DIRECTORY_NOT_EMPTY -16
  1283. #define DRFLAC_END_OF_FILE -17
  1284. #define DRFLAC_NO_SPACE -18
  1285. #define DRFLAC_BUSY -19
  1286. #define DRFLAC_IO_ERROR -20
  1287. #define DRFLAC_INTERRUPT -21
  1288. #define DRFLAC_UNAVAILABLE -22
  1289. #define DRFLAC_ALREADY_IN_USE -23
  1290. #define DRFLAC_BAD_ADDRESS -24
  1291. #define DRFLAC_BAD_SEEK -25
  1292. #define DRFLAC_BAD_PIPE -26
  1293. #define DRFLAC_DEADLOCK -27
  1294. #define DRFLAC_TOO_MANY_LINKS -28
  1295. #define DRFLAC_NOT_IMPLEMENTED -29
  1296. #define DRFLAC_NO_MESSAGE -30
  1297. #define DRFLAC_BAD_MESSAGE -31
  1298. #define DRFLAC_NO_DATA_AVAILABLE -32
  1299. #define DRFLAC_INVALID_DATA -33
  1300. #define DRFLAC_TIMEOUT -34
  1301. #define DRFLAC_NO_NETWORK -35
  1302. #define DRFLAC_NOT_UNIQUE -36
  1303. #define DRFLAC_NOT_SOCKET -37
  1304. #define DRFLAC_NO_ADDRESS -38
  1305. #define DRFLAC_BAD_PROTOCOL -39
  1306. #define DRFLAC_PROTOCOL_UNAVAILABLE -40
  1307. #define DRFLAC_PROTOCOL_NOT_SUPPORTED -41
  1308. #define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED -42
  1309. #define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED -43
  1310. #define DRFLAC_SOCKET_NOT_SUPPORTED -44
  1311. #define DRFLAC_CONNECTION_RESET -45
  1312. #define DRFLAC_ALREADY_CONNECTED -46
  1313. #define DRFLAC_NOT_CONNECTED -47
  1314. #define DRFLAC_CONNECTION_REFUSED -48
  1315. #define DRFLAC_NO_HOST -49
  1316. #define DRFLAC_IN_PROGRESS -50
  1317. #define DRFLAC_CANCELLED -51
  1318. #define DRFLAC_MEMORY_ALREADY_MAPPED -52
  1319. #define DRFLAC_AT_END -53
  1320. #define DRFLAC_CRC_MISMATCH -128
  1321. #define DRFLAC_SUBFRAME_CONSTANT 0
  1322. #define DRFLAC_SUBFRAME_VERBATIM 1
  1323. #define DRFLAC_SUBFRAME_FIXED 8
  1324. #define DRFLAC_SUBFRAME_LPC 32
  1325. #define DRFLAC_SUBFRAME_RESERVED 255
  1326. #define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE 0
  1327. #define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
  1328. #define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT 0
  1329. #define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE 8
  1330. #define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE 9
  1331. #define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE 10
  1332. #define drflac_align(x, a) ((((x) + (a) - 1) / (a)) * (a))
  1333. DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
  1334. {
  1335. if (pMajor) {
  1336. *pMajor = DRFLAC_VERSION_MAJOR;
  1337. }
  1338. if (pMinor) {
  1339. *pMinor = DRFLAC_VERSION_MINOR;
  1340. }
  1341. if (pRevision) {
  1342. *pRevision = DRFLAC_VERSION_REVISION;
  1343. }
  1344. }
  1345. DRFLAC_API const char* drflac_version_string(void)
  1346. {
  1347. return DRFLAC_VERSION_STRING;
  1348. }
  1349. /* CPU caps. */
  1350. #if defined(__has_feature)
  1351. #if __has_feature(thread_sanitizer)
  1352. #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
  1353. #else
  1354. #define DRFLAC_NO_THREAD_SANITIZE
  1355. #endif
  1356. #else
  1357. #define DRFLAC_NO_THREAD_SANITIZE
  1358. #endif
  1359. #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
  1360. static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
  1361. #endif
  1362. #ifndef DRFLAC_NO_CPUID
  1363. static drflac_bool32 drflac__gIsSSE2Supported = DRFLAC_FALSE;
  1364. static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
  1365. /*
  1366. I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
  1367. actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
  1368. complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
  1369. just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
  1370. */
  1371. DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
  1372. {
  1373. static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
  1374. if (!isCPUCapsInitialized) {
  1375. /* LZCNT */
  1376. #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
  1377. int info[4] = {0};
  1378. drflac__cpuid(info, 0x80000001);
  1379. drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
  1380. #endif
  1381. /* SSE2 */
  1382. drflac__gIsSSE2Supported = drflac_has_sse2();
  1383. /* SSE4.1 */
  1384. drflac__gIsSSE41Supported = drflac_has_sse41();
  1385. /* Initialized. */
  1386. isCPUCapsInitialized = DRFLAC_TRUE;
  1387. }
  1388. }
  1389. #else
  1390. static drflac_bool32 drflac__gIsNEONSupported = DRFLAC_FALSE;
  1391. static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
  1392. {
  1393. #if defined(DRFLAC_SUPPORT_NEON)
  1394. #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
  1395. #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
  1396. return DRFLAC_TRUE; /* If the compiler is allowed to freely generate NEON code we can assume support. */
  1397. #else
  1398. /* TODO: Runtime check. */
  1399. return DRFLAC_FALSE;
  1400. #endif
  1401. #else
  1402. return DRFLAC_FALSE; /* NEON is only supported on ARM architectures. */
  1403. #endif
  1404. #else
  1405. return DRFLAC_FALSE; /* No compiler support. */
  1406. #endif
  1407. }
  1408. DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
  1409. {
  1410. drflac__gIsNEONSupported = drflac__has_neon();
  1411. #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
  1412. drflac__gIsLZCNTSupported = DRFLAC_TRUE;
  1413. #endif
  1414. }
  1415. #endif
  1416. /* Endian Management */
  1417. static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
  1418. {
  1419. #if defined(DRFLAC_X86) || defined(DRFLAC_X64)
  1420. return DRFLAC_TRUE;
  1421. #elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
  1422. return DRFLAC_TRUE;
  1423. #else
  1424. int n = 1;
  1425. return (*(char*)&n) == 1;
  1426. #endif
  1427. }
  1428. static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
  1429. {
  1430. #ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
  1431. #if defined(_MSC_VER) && !defined(__clang__)
  1432. return _byteswap_ushort(n);
  1433. #elif defined(__GNUC__) || defined(__clang__)
  1434. return __builtin_bswap16(n);
  1435. #elif defined(__WATCOMC__) && defined(__386__)
  1436. return _watcom_bswap16(n);
  1437. #else
  1438. #error "This compiler does not support the byte swap intrinsic."
  1439. #endif
  1440. #else
  1441. return ((n & 0xFF00) >> 8) |
  1442. ((n & 0x00FF) << 8);
  1443. #endif
  1444. }
  1445. static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
  1446. {
  1447. #ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
  1448. #if defined(_MSC_VER) && !defined(__clang__)
  1449. return _byteswap_ulong(n);
  1450. #elif defined(__GNUC__) || defined(__clang__)
  1451. #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(DRFLAC_64BIT) /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
  1452. /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
  1453. drflac_uint32 r;
  1454. __asm__ __volatile__ (
  1455. #if defined(DRFLAC_64BIT)
  1456. "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
  1457. #else
  1458. "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
  1459. #endif
  1460. );
  1461. return r;
  1462. #else
  1463. return __builtin_bswap32(n);
  1464. #endif
  1465. #elif defined(__WATCOMC__) && defined(__386__)
  1466. return _watcom_bswap32(n);
  1467. #else
  1468. #error "This compiler does not support the byte swap intrinsic."
  1469. #endif
  1470. #else
  1471. return ((n & 0xFF000000) >> 24) |
  1472. ((n & 0x00FF0000) >> 8) |
  1473. ((n & 0x0000FF00) << 8) |
  1474. ((n & 0x000000FF) << 24);
  1475. #endif
  1476. }
  1477. static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
  1478. {
  1479. #ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
  1480. #if defined(_MSC_VER) && !defined(__clang__)
  1481. return _byteswap_uint64(n);
  1482. #elif defined(__GNUC__) || defined(__clang__)
  1483. return __builtin_bswap64(n);
  1484. #elif defined(__WATCOMC__) && defined(__386__)
  1485. return _watcom_bswap64(n);
  1486. #else
  1487. #error "This compiler does not support the byte swap intrinsic."
  1488. #endif
  1489. #else
  1490. /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
  1491. return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
  1492. ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
  1493. ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
  1494. ((n & ((drflac_uint64)0x000000FF << 32)) >> 8) |
  1495. ((n & ((drflac_uint64)0xFF000000 )) << 8) |
  1496. ((n & ((drflac_uint64)0x00FF0000 )) << 24) |
  1497. ((n & ((drflac_uint64)0x0000FF00 )) << 40) |
  1498. ((n & ((drflac_uint64)0x000000FF )) << 56);
  1499. #endif
  1500. }
  1501. static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
  1502. {
  1503. if (drflac__is_little_endian()) {
  1504. return drflac__swap_endian_uint16(n);
  1505. }
  1506. return n;
  1507. }
  1508. static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
  1509. {
  1510. if (drflac__is_little_endian()) {
  1511. return drflac__swap_endian_uint32(n);
  1512. }
  1513. return n;
  1514. }
  1515. static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
  1516. {
  1517. if (drflac__is_little_endian()) {
  1518. return drflac__swap_endian_uint64(n);
  1519. }
  1520. return n;
  1521. }
  1522. static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
  1523. {
  1524. if (!drflac__is_little_endian()) {
  1525. return drflac__swap_endian_uint32(n);
  1526. }
  1527. return n;
  1528. }
  1529. static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
  1530. {
  1531. drflac_uint32 result = 0;
  1532. result |= (n & 0x7F000000) >> 3;
  1533. result |= (n & 0x007F0000) >> 2;
  1534. result |= (n & 0x00007F00) >> 1;
  1535. result |= (n & 0x0000007F) >> 0;
  1536. return result;
  1537. }
  1538. /* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
  1539. static drflac_uint8 drflac__crc8_table[] = {
  1540. 0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
  1541. 0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
  1542. 0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
  1543. 0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
  1544. 0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
  1545. 0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
  1546. 0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
  1547. 0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
  1548. 0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
  1549. 0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
  1550. 0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
  1551. 0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
  1552. 0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
  1553. 0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
  1554. 0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
  1555. 0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
  1556. };
  1557. static drflac_uint16 drflac__crc16_table[] = {
  1558. 0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
  1559. 0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
  1560. 0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
  1561. 0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
  1562. 0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
  1563. 0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
  1564. 0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
  1565. 0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
  1566. 0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
  1567. 0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
  1568. 0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
  1569. 0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
  1570. 0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
  1571. 0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
  1572. 0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
  1573. 0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
  1574. 0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
  1575. 0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
  1576. 0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
  1577. 0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
  1578. 0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
  1579. 0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
  1580. 0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
  1581. 0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
  1582. 0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
  1583. 0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
  1584. 0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
  1585. 0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
  1586. 0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
  1587. 0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
  1588. 0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
  1589. 0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
  1590. };
  1591. static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
  1592. {
  1593. return drflac__crc8_table[crc ^ data];
  1594. }
  1595. static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
  1596. {
  1597. #ifdef DR_FLAC_NO_CRC
  1598. (void)crc;
  1599. (void)data;
  1600. (void)count;
  1601. return 0;
  1602. #else
  1603. #if 0
  1604. /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
  1605. drflac_uint8 p = 0x07;
  1606. for (int i = count-1; i >= 0; --i) {
  1607. drflac_uint8 bit = (data & (1 << i)) >> i;
  1608. if (crc & 0x80) {
  1609. crc = ((crc << 1) | bit) ^ p;
  1610. } else {
  1611. crc = ((crc << 1) | bit);
  1612. }
  1613. }
  1614. return crc;
  1615. #else
  1616. drflac_uint32 wholeBytes;
  1617. drflac_uint32 leftoverBits;
  1618. drflac_uint64 leftoverDataMask;
  1619. static drflac_uint64 leftoverDataMaskTable[8] = {
  1620. 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
  1621. };
  1622. DRFLAC_ASSERT(count <= 32);
  1623. wholeBytes = count >> 3;
  1624. leftoverBits = count - (wholeBytes*8);
  1625. leftoverDataMask = leftoverDataMaskTable[leftoverBits];
  1626. switch (wholeBytes) {
  1627. case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
  1628. case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
  1629. case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
  1630. case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
  1631. case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
  1632. }
  1633. return crc;
  1634. #endif
  1635. #endif
  1636. }
  1637. static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
  1638. {
  1639. return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
  1640. }
  1641. static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
  1642. {
  1643. #ifdef DRFLAC_64BIT
  1644. crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
  1645. crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
  1646. crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
  1647. crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
  1648. #endif
  1649. crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
  1650. crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
  1651. crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
  1652. crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
  1653. return crc;
  1654. }
  1655. static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
  1656. {
  1657. switch (byteCount)
  1658. {
  1659. #ifdef DRFLAC_64BIT
  1660. case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
  1661. case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
  1662. case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
  1663. case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
  1664. #endif
  1665. case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
  1666. case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
  1667. case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
  1668. case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
  1669. }
  1670. return crc;
  1671. }
  1672. #if 0
  1673. static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
  1674. {
  1675. #ifdef DR_FLAC_NO_CRC
  1676. (void)crc;
  1677. (void)data;
  1678. (void)count;
  1679. return 0;
  1680. #else
  1681. #if 0
  1682. /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
  1683. drflac_uint16 p = 0x8005;
  1684. for (int i = count-1; i >= 0; --i) {
  1685. drflac_uint16 bit = (data & (1ULL << i)) >> i;
  1686. if (r & 0x8000) {
  1687. r = ((r << 1) | bit) ^ p;
  1688. } else {
  1689. r = ((r << 1) | bit);
  1690. }
  1691. }
  1692. return crc;
  1693. #else
  1694. drflac_uint32 wholeBytes;
  1695. drflac_uint32 leftoverBits;
  1696. drflac_uint64 leftoverDataMask;
  1697. static drflac_uint64 leftoverDataMaskTable[8] = {
  1698. 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
  1699. };
  1700. DRFLAC_ASSERT(count <= 64);
  1701. wholeBytes = count >> 3;
  1702. leftoverBits = count & 7;
  1703. leftoverDataMask = leftoverDataMaskTable[leftoverBits];
  1704. switch (wholeBytes) {
  1705. default:
  1706. case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
  1707. case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
  1708. case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
  1709. case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
  1710. case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
  1711. }
  1712. return crc;
  1713. #endif
  1714. #endif
  1715. }
  1716. static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
  1717. {
  1718. #ifdef DR_FLAC_NO_CRC
  1719. (void)crc;
  1720. (void)data;
  1721. (void)count;
  1722. return 0;
  1723. #else
  1724. drflac_uint32 wholeBytes;
  1725. drflac_uint32 leftoverBits;
  1726. drflac_uint64 leftoverDataMask;
  1727. static drflac_uint64 leftoverDataMaskTable[8] = {
  1728. 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
  1729. };
  1730. DRFLAC_ASSERT(count <= 64);
  1731. wholeBytes = count >> 3;
  1732. leftoverBits = count & 7;
  1733. leftoverDataMask = leftoverDataMaskTable[leftoverBits];
  1734. switch (wholeBytes) {
  1735. default:
  1736. case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits))); /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
  1737. case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
  1738. case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
  1739. case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
  1740. case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 ) << leftoverBits)) >> (24 + leftoverBits)));
  1741. case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 ) << leftoverBits)) >> (16 + leftoverBits)));
  1742. case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 ) << leftoverBits)) >> ( 8 + leftoverBits)));
  1743. case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF ) << leftoverBits)) >> ( 0 + leftoverBits)));
  1744. case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
  1745. }
  1746. return crc;
  1747. #endif
  1748. }
  1749. static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
  1750. {
  1751. #ifdef DRFLAC_64BIT
  1752. return drflac_crc16__64bit(crc, data, count);
  1753. #else
  1754. return drflac_crc16__32bit(crc, data, count);
  1755. #endif
  1756. }
  1757. #endif
  1758. #ifdef DRFLAC_64BIT
  1759. #define drflac__be2host__cache_line drflac__be2host_64
  1760. #else
  1761. #define drflac__be2host__cache_line drflac__be2host_32
  1762. #endif
  1763. /*
  1764. BIT READING ATTEMPT #2
  1765. This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
  1766. on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
  1767. is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
  1768. array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
  1769. from onRead() is read into.
  1770. */
  1771. #define DRFLAC_CACHE_L1_SIZE_BYTES(bs) (sizeof((bs)->cache))
  1772. #define DRFLAC_CACHE_L1_SIZE_BITS(bs) (sizeof((bs)->cache)*8)
  1773. #define DRFLAC_CACHE_L1_BITS_REMAINING(bs) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
  1774. #define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount) (~((~(drflac_cache_t)0) >> (_bitCount)))
  1775. #define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
  1776. #define DRFLAC_CACHE_L1_SELECT(bs, _bitCount) (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
  1777. #define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
  1778. #define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
  1779. #define DRFLAC_CACHE_L2_SIZE_BYTES(bs) (sizeof((bs)->cacheL2))
  1780. #define DRFLAC_CACHE_L2_LINE_COUNT(bs) (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
  1781. #define DRFLAC_CACHE_L2_LINES_REMAINING(bs) (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
  1782. #ifndef DR_FLAC_NO_CRC
  1783. static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
  1784. {
  1785. bs->crc16 = 0;
  1786. bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
  1787. }
  1788. static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
  1789. {
  1790. if (bs->crc16CacheIgnoredBytes == 0) {
  1791. bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
  1792. } else {
  1793. bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
  1794. bs->crc16CacheIgnoredBytes = 0;
  1795. }
  1796. }
  1797. static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
  1798. {
  1799. /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
  1800. DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
  1801. /*
  1802. The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
  1803. by the number of bits that have been consumed.
  1804. */
  1805. if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
  1806. drflac__update_crc16(bs);
  1807. } else {
  1808. /* We only accumulate the consumed bits. */
  1809. bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
  1810. /*
  1811. The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
  1812. so we can handle that later.
  1813. */
  1814. bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
  1815. }
  1816. return bs->crc16;
  1817. }
  1818. #endif
  1819. static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
  1820. {
  1821. size_t bytesRead;
  1822. size_t alignedL1LineCount;
  1823. /* Fast path. Try loading straight from L2. */
  1824. if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
  1825. bs->cache = bs->cacheL2[bs->nextL2Line++];
  1826. return DRFLAC_TRUE;
  1827. }
  1828. /*
  1829. If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
  1830. any left.
  1831. */
  1832. if (bs->unalignedByteCount > 0) {
  1833. return DRFLAC_FALSE; /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
  1834. }
  1835. bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
  1836. bs->nextL2Line = 0;
  1837. if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
  1838. bs->cache = bs->cacheL2[bs->nextL2Line++];
  1839. return DRFLAC_TRUE;
  1840. }
  1841. /*
  1842. If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
  1843. means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
  1844. and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
  1845. the size of the L1 so we'll need to seek backwards by any misaligned bytes.
  1846. */
  1847. alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
  1848. /* We need to keep track of any unaligned bytes for later use. */
  1849. bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
  1850. if (bs->unalignedByteCount > 0) {
  1851. bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
  1852. }
  1853. if (alignedL1LineCount > 0) {
  1854. size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
  1855. size_t i;
  1856. for (i = alignedL1LineCount; i > 0; --i) {
  1857. bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
  1858. }
  1859. bs->nextL2Line = (drflac_uint32)offset;
  1860. bs->cache = bs->cacheL2[bs->nextL2Line++];
  1861. return DRFLAC_TRUE;
  1862. } else {
  1863. /* If we get into this branch it means we weren't able to load any L1-aligned data. */
  1864. bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
  1865. return DRFLAC_FALSE;
  1866. }
  1867. }
  1868. static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
  1869. {
  1870. size_t bytesRead;
  1871. #ifndef DR_FLAC_NO_CRC
  1872. drflac__update_crc16(bs);
  1873. #endif
  1874. /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
  1875. if (drflac__reload_l1_cache_from_l2(bs)) {
  1876. bs->cache = drflac__be2host__cache_line(bs->cache);
  1877. bs->consumedBits = 0;
  1878. #ifndef DR_FLAC_NO_CRC
  1879. bs->crc16Cache = bs->cache;
  1880. #endif
  1881. return DRFLAC_TRUE;
  1882. }
  1883. /* Slow path. */
  1884. /*
  1885. If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
  1886. few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
  1887. data from the unaligned cache.
  1888. */
  1889. bytesRead = bs->unalignedByteCount;
  1890. if (bytesRead == 0) {
  1891. bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- The stream has been exhausted, so marked the bits as consumed. */
  1892. return DRFLAC_FALSE;
  1893. }
  1894. DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
  1895. bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
  1896. bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
  1897. bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs)); /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
  1898. bs->unalignedByteCount = 0; /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
  1899. #ifndef DR_FLAC_NO_CRC
  1900. bs->crc16Cache = bs->cache >> bs->consumedBits;
  1901. bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
  1902. #endif
  1903. return DRFLAC_TRUE;
  1904. }
  1905. static void drflac__reset_cache(drflac_bs* bs)
  1906. {
  1907. bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); /* <-- This clears the L2 cache. */
  1908. bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- This clears the L1 cache. */
  1909. bs->cache = 0;
  1910. bs->unalignedByteCount = 0; /* <-- This clears the trailing unaligned bytes. */
  1911. bs->unalignedCache = 0;
  1912. #ifndef DR_FLAC_NO_CRC
  1913. bs->crc16Cache = 0;
  1914. bs->crc16CacheIgnoredBytes = 0;
  1915. #endif
  1916. }
  1917. static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
  1918. {
  1919. DRFLAC_ASSERT(bs != NULL);
  1920. DRFLAC_ASSERT(pResultOut != NULL);
  1921. DRFLAC_ASSERT(bitCount > 0);
  1922. DRFLAC_ASSERT(bitCount <= 32);
  1923. if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
  1924. if (!drflac__reload_cache(bs)) {
  1925. return DRFLAC_FALSE;
  1926. }
  1927. }
  1928. if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
  1929. /*
  1930. If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
  1931. a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
  1932. more optimal solution for this.
  1933. */
  1934. #ifdef DRFLAC_64BIT
  1935. *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
  1936. bs->consumedBits += bitCount;
  1937. bs->cache <<= bitCount;
  1938. #else
  1939. if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
  1940. *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
  1941. bs->consumedBits += bitCount;
  1942. bs->cache <<= bitCount;
  1943. } else {
  1944. /* Cannot shift by 32-bits, so need to do it differently. */
  1945. *pResultOut = (drflac_uint32)bs->cache;
  1946. bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
  1947. bs->cache = 0;
  1948. }
  1949. #endif
  1950. return DRFLAC_TRUE;
  1951. } else {
  1952. /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
  1953. drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
  1954. drflac_uint32 bitCountLo = bitCount - bitCountHi;
  1955. drflac_uint32 resultHi;
  1956. DRFLAC_ASSERT(bitCountHi > 0);
  1957. DRFLAC_ASSERT(bitCountHi < 32);
  1958. resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
  1959. if (!drflac__reload_cache(bs)) {
  1960. return DRFLAC_FALSE;
  1961. }
  1962. *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
  1963. bs->consumedBits += bitCountLo;
  1964. bs->cache <<= bitCountLo;
  1965. return DRFLAC_TRUE;
  1966. }
  1967. }
  1968. static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
  1969. {
  1970. drflac_uint32 result;
  1971. DRFLAC_ASSERT(bs != NULL);
  1972. DRFLAC_ASSERT(pResult != NULL);
  1973. DRFLAC_ASSERT(bitCount > 0);
  1974. DRFLAC_ASSERT(bitCount <= 32);
  1975. if (!drflac__read_uint32(bs, bitCount, &result)) {
  1976. return DRFLAC_FALSE;
  1977. }
  1978. /* Do not attempt to shift by 32 as it's undefined. */
  1979. if (bitCount < 32) {
  1980. drflac_uint32 signbit;
  1981. signbit = ((result >> (bitCount-1)) & 0x01);
  1982. result |= (~signbit + 1) << bitCount;
  1983. }
  1984. *pResult = (drflac_int32)result;
  1985. return DRFLAC_TRUE;
  1986. }
  1987. #ifdef DRFLAC_64BIT
  1988. static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
  1989. {
  1990. drflac_uint32 resultHi;
  1991. drflac_uint32 resultLo;
  1992. DRFLAC_ASSERT(bitCount <= 64);
  1993. DRFLAC_ASSERT(bitCount > 32);
  1994. if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
  1995. return DRFLAC_FALSE;
  1996. }
  1997. if (!drflac__read_uint32(bs, 32, &resultLo)) {
  1998. return DRFLAC_FALSE;
  1999. }
  2000. *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
  2001. return DRFLAC_TRUE;
  2002. }
  2003. #endif
  2004. /* Function below is unused, but leaving it here in case I need to quickly add it again. */
  2005. #if 0
  2006. static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
  2007. {
  2008. drflac_uint64 result;
  2009. drflac_uint64 signbit;
  2010. DRFLAC_ASSERT(bitCount <= 64);
  2011. if (!drflac__read_uint64(bs, bitCount, &result)) {
  2012. return DRFLAC_FALSE;
  2013. }
  2014. signbit = ((result >> (bitCount-1)) & 0x01);
  2015. result |= (~signbit + 1) << bitCount;
  2016. *pResultOut = (drflac_int64)result;
  2017. return DRFLAC_TRUE;
  2018. }
  2019. #endif
  2020. static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
  2021. {
  2022. drflac_uint32 result;
  2023. DRFLAC_ASSERT(bs != NULL);
  2024. DRFLAC_ASSERT(pResult != NULL);
  2025. DRFLAC_ASSERT(bitCount > 0);
  2026. DRFLAC_ASSERT(bitCount <= 16);
  2027. if (!drflac__read_uint32(bs, bitCount, &result)) {
  2028. return DRFLAC_FALSE;
  2029. }
  2030. *pResult = (drflac_uint16)result;
  2031. return DRFLAC_TRUE;
  2032. }
  2033. #if 0
  2034. static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
  2035. {
  2036. drflac_int32 result;
  2037. DRFLAC_ASSERT(bs != NULL);
  2038. DRFLAC_ASSERT(pResult != NULL);
  2039. DRFLAC_ASSERT(bitCount > 0);
  2040. DRFLAC_ASSERT(bitCount <= 16);
  2041. if (!drflac__read_int32(bs, bitCount, &result)) {
  2042. return DRFLAC_FALSE;
  2043. }
  2044. *pResult = (drflac_int16)result;
  2045. return DRFLAC_TRUE;
  2046. }
  2047. #endif
  2048. static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
  2049. {
  2050. drflac_uint32 result;
  2051. DRFLAC_ASSERT(bs != NULL);
  2052. DRFLAC_ASSERT(pResult != NULL);
  2053. DRFLAC_ASSERT(bitCount > 0);
  2054. DRFLAC_ASSERT(bitCount <= 8);
  2055. if (!drflac__read_uint32(bs, bitCount, &result)) {
  2056. return DRFLAC_FALSE;
  2057. }
  2058. *pResult = (drflac_uint8)result;
  2059. return DRFLAC_TRUE;
  2060. }
  2061. static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
  2062. {
  2063. drflac_int32 result;
  2064. DRFLAC_ASSERT(bs != NULL);
  2065. DRFLAC_ASSERT(pResult != NULL);
  2066. DRFLAC_ASSERT(bitCount > 0);
  2067. DRFLAC_ASSERT(bitCount <= 8);
  2068. if (!drflac__read_int32(bs, bitCount, &result)) {
  2069. return DRFLAC_FALSE;
  2070. }
  2071. *pResult = (drflac_int8)result;
  2072. return DRFLAC_TRUE;
  2073. }
  2074. static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
  2075. {
  2076. if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
  2077. bs->consumedBits += (drflac_uint32)bitsToSeek;
  2078. bs->cache <<= bitsToSeek;
  2079. return DRFLAC_TRUE;
  2080. } else {
  2081. /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
  2082. bitsToSeek -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
  2083. bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
  2084. bs->cache = 0;
  2085. /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
  2086. #ifdef DRFLAC_64BIT
  2087. while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
  2088. drflac_uint64 bin;
  2089. if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
  2090. return DRFLAC_FALSE;
  2091. }
  2092. bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
  2093. }
  2094. #else
  2095. while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
  2096. drflac_uint32 bin;
  2097. if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
  2098. return DRFLAC_FALSE;
  2099. }
  2100. bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
  2101. }
  2102. #endif
  2103. /* Whole leftover bytes. */
  2104. while (bitsToSeek >= 8) {
  2105. drflac_uint8 bin;
  2106. if (!drflac__read_uint8(bs, 8, &bin)) {
  2107. return DRFLAC_FALSE;
  2108. }
  2109. bitsToSeek -= 8;
  2110. }
  2111. /* Leftover bits. */
  2112. if (bitsToSeek > 0) {
  2113. drflac_uint8 bin;
  2114. if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
  2115. return DRFLAC_FALSE;
  2116. }
  2117. bitsToSeek = 0; /* <-- Necessary for the assert below. */
  2118. }
  2119. DRFLAC_ASSERT(bitsToSeek == 0);
  2120. return DRFLAC_TRUE;
  2121. }
  2122. }
  2123. /* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
  2124. static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
  2125. {
  2126. DRFLAC_ASSERT(bs != NULL);
  2127. /*
  2128. The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
  2129. thing to do is align to the next byte.
  2130. */
  2131. if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
  2132. return DRFLAC_FALSE;
  2133. }
  2134. for (;;) {
  2135. drflac_uint8 hi;
  2136. #ifndef DR_FLAC_NO_CRC
  2137. drflac__reset_crc16(bs);
  2138. #endif
  2139. if (!drflac__read_uint8(bs, 8, &hi)) {
  2140. return DRFLAC_FALSE;
  2141. }
  2142. if (hi == 0xFF) {
  2143. drflac_uint8 lo;
  2144. if (!drflac__read_uint8(bs, 6, &lo)) {
  2145. return DRFLAC_FALSE;
  2146. }
  2147. if (lo == 0x3E) {
  2148. return DRFLAC_TRUE;
  2149. } else {
  2150. if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
  2151. return DRFLAC_FALSE;
  2152. }
  2153. }
  2154. }
  2155. }
  2156. /* Should never get here. */
  2157. /*return DRFLAC_FALSE;*/
  2158. }
  2159. #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
  2160. #define DRFLAC_IMPLEMENT_CLZ_LZCNT
  2161. #endif
  2162. #if defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
  2163. #define DRFLAC_IMPLEMENT_CLZ_MSVC
  2164. #endif
  2165. #if defined(__WATCOMC__) && defined(__386__)
  2166. #define DRFLAC_IMPLEMENT_CLZ_WATCOM
  2167. #endif
  2168. static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
  2169. {
  2170. drflac_uint32 n;
  2171. static drflac_uint32 clz_table_4[] = {
  2172. 0,
  2173. 4,
  2174. 3, 3,
  2175. 2, 2, 2, 2,
  2176. 1, 1, 1, 1, 1, 1, 1, 1
  2177. };
  2178. if (x == 0) {
  2179. return sizeof(x)*8;
  2180. }
  2181. n = clz_table_4[x >> (sizeof(x)*8 - 4)];
  2182. if (n == 0) {
  2183. #ifdef DRFLAC_64BIT
  2184. if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n = 32; x <<= 32; }
  2185. if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
  2186. if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8; x <<= 8; }
  2187. if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4; x <<= 4; }
  2188. #else
  2189. if ((x & 0xFFFF0000) == 0) { n = 16; x <<= 16; }
  2190. if ((x & 0xFF000000) == 0) { n += 8; x <<= 8; }
  2191. if ((x & 0xF0000000) == 0) { n += 4; x <<= 4; }
  2192. #endif
  2193. n += clz_table_4[x >> (sizeof(x)*8 - 4)];
  2194. }
  2195. return n - 1;
  2196. }
  2197. #ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
  2198. static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
  2199. {
  2200. /* Fast compile time check for ARM. */
  2201. #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
  2202. return DRFLAC_TRUE;
  2203. #else
  2204. /* If the compiler itself does not support the intrinsic then we'll need to return false. */
  2205. #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
  2206. return drflac__gIsLZCNTSupported;
  2207. #else
  2208. return DRFLAC_FALSE;
  2209. #endif
  2210. #endif
  2211. }
  2212. static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
  2213. {
  2214. /*
  2215. It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
  2216. to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
  2217. it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
  2218. 64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
  2219. around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
  2220. the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
  2221. in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
  2222. getting clobbered?
  2223. I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
  2224. assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
  2225. Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
  2226. compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
  2227. to know how to fix the inlined assembly for correctness sake, however.
  2228. */
  2229. #if defined(_MSC_VER) /*&& !defined(__clang__)*/ /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
  2230. #ifdef DRFLAC_64BIT
  2231. return (drflac_uint32)__lzcnt64(x);
  2232. #else
  2233. return (drflac_uint32)__lzcnt(x);
  2234. #endif
  2235. #else
  2236. #if defined(__GNUC__) || defined(__clang__)
  2237. #if defined(DRFLAC_X64)
  2238. {
  2239. drflac_uint64 r;
  2240. __asm__ __volatile__ (
  2241. "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
  2242. );
  2243. return (drflac_uint32)r;
  2244. }
  2245. #elif defined(DRFLAC_X86)
  2246. {
  2247. drflac_uint32 r;
  2248. __asm__ __volatile__ (
  2249. "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
  2250. );
  2251. return r;
  2252. }
  2253. #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(DRFLAC_64BIT) /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
  2254. {
  2255. unsigned int r;
  2256. __asm__ __volatile__ (
  2257. #if defined(DRFLAC_64BIT)
  2258. "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
  2259. #else
  2260. "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
  2261. #endif
  2262. );
  2263. return r;
  2264. }
  2265. #else
  2266. if (x == 0) {
  2267. return sizeof(x)*8;
  2268. }
  2269. #ifdef DRFLAC_64BIT
  2270. return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
  2271. #else
  2272. return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
  2273. #endif
  2274. #endif
  2275. #else
  2276. /* Unsupported compiler. */
  2277. #error "This compiler does not support the lzcnt intrinsic."
  2278. #endif
  2279. #endif
  2280. }
  2281. #endif
  2282. #ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
  2283. #include <intrin.h> /* For BitScanReverse(). */
  2284. static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
  2285. {
  2286. drflac_uint32 n;
  2287. if (x == 0) {
  2288. return sizeof(x)*8;
  2289. }
  2290. #ifdef DRFLAC_64BIT
  2291. _BitScanReverse64((unsigned long*)&n, x);
  2292. #else
  2293. _BitScanReverse((unsigned long*)&n, x);
  2294. #endif
  2295. return sizeof(x)*8 - n - 1;
  2296. }
  2297. #endif
  2298. #ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
  2299. static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
  2300. #pragma aux drflac__clz_watcom = \
  2301. "bsr eax, eax" \
  2302. "xor eax, 31" \
  2303. parm [eax] nomemory \
  2304. value [eax] \
  2305. modify exact [eax] nomemory;
  2306. #endif
  2307. static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
  2308. {
  2309. #ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
  2310. if (drflac__is_lzcnt_supported()) {
  2311. return drflac__clz_lzcnt(x);
  2312. } else
  2313. #endif
  2314. {
  2315. #ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
  2316. return drflac__clz_msvc(x);
  2317. #elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
  2318. return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x);
  2319. #else
  2320. return drflac__clz_software(x);
  2321. #endif
  2322. }
  2323. }
  2324. static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
  2325. {
  2326. drflac_uint32 zeroCounter = 0;
  2327. drflac_uint32 setBitOffsetPlus1;
  2328. while (bs->cache == 0) {
  2329. zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
  2330. if (!drflac__reload_cache(bs)) {
  2331. return DRFLAC_FALSE;
  2332. }
  2333. }
  2334. setBitOffsetPlus1 = drflac__clz(bs->cache);
  2335. setBitOffsetPlus1 += 1;
  2336. bs->consumedBits += setBitOffsetPlus1;
  2337. bs->cache <<= setBitOffsetPlus1;
  2338. *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
  2339. return DRFLAC_TRUE;
  2340. }
  2341. static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
  2342. {
  2343. DRFLAC_ASSERT(bs != NULL);
  2344. DRFLAC_ASSERT(offsetFromStart > 0);
  2345. /*
  2346. Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
  2347. is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
  2348. To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
  2349. */
  2350. if (offsetFromStart > 0x7FFFFFFF) {
  2351. drflac_uint64 bytesRemaining = offsetFromStart;
  2352. if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
  2353. return DRFLAC_FALSE;
  2354. }
  2355. bytesRemaining -= 0x7FFFFFFF;
  2356. while (bytesRemaining > 0x7FFFFFFF) {
  2357. if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
  2358. return DRFLAC_FALSE;
  2359. }
  2360. bytesRemaining -= 0x7FFFFFFF;
  2361. }
  2362. if (bytesRemaining > 0) {
  2363. if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
  2364. return DRFLAC_FALSE;
  2365. }
  2366. }
  2367. } else {
  2368. if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
  2369. return DRFLAC_FALSE;
  2370. }
  2371. }
  2372. /* The cache should be reset to force a reload of fresh data from the client. */
  2373. drflac__reset_cache(bs);
  2374. return DRFLAC_TRUE;
  2375. }
  2376. static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
  2377. {
  2378. drflac_uint8 crc;
  2379. drflac_uint64 result;
  2380. drflac_uint8 utf8[7] = {0};
  2381. int byteCount;
  2382. int i;
  2383. DRFLAC_ASSERT(bs != NULL);
  2384. DRFLAC_ASSERT(pNumberOut != NULL);
  2385. DRFLAC_ASSERT(pCRCOut != NULL);
  2386. crc = *pCRCOut;
  2387. if (!drflac__read_uint8(bs, 8, utf8)) {
  2388. *pNumberOut = 0;
  2389. return DRFLAC_AT_END;
  2390. }
  2391. crc = drflac_crc8(crc, utf8[0], 8);
  2392. if ((utf8[0] & 0x80) == 0) {
  2393. *pNumberOut = utf8[0];
  2394. *pCRCOut = crc;
  2395. return DRFLAC_SUCCESS;
  2396. }
  2397. /*byteCount = 1;*/
  2398. if ((utf8[0] & 0xE0) == 0xC0) {
  2399. byteCount = 2;
  2400. } else if ((utf8[0] & 0xF0) == 0xE0) {
  2401. byteCount = 3;
  2402. } else if ((utf8[0] & 0xF8) == 0xF0) {
  2403. byteCount = 4;
  2404. } else if ((utf8[0] & 0xFC) == 0xF8) {
  2405. byteCount = 5;
  2406. } else if ((utf8[0] & 0xFE) == 0xFC) {
  2407. byteCount = 6;
  2408. } else if ((utf8[0] & 0xFF) == 0xFE) {
  2409. byteCount = 7;
  2410. } else {
  2411. *pNumberOut = 0;
  2412. return DRFLAC_CRC_MISMATCH; /* Bad UTF-8 encoding. */
  2413. }
  2414. /* Read extra bytes. */
  2415. DRFLAC_ASSERT(byteCount > 1);
  2416. result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
  2417. for (i = 1; i < byteCount; ++i) {
  2418. if (!drflac__read_uint8(bs, 8, utf8 + i)) {
  2419. *pNumberOut = 0;
  2420. return DRFLAC_AT_END;
  2421. }
  2422. crc = drflac_crc8(crc, utf8[i], 8);
  2423. result = (result << 6) | (utf8[i] & 0x3F);
  2424. }
  2425. *pNumberOut = result;
  2426. *pCRCOut = crc;
  2427. return DRFLAC_SUCCESS;
  2428. }
  2429. /*
  2430. The next two functions are responsible for calculating the prediction.
  2431. When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
  2432. safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
  2433. */
  2434. static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
  2435. {
  2436. drflac_int32 prediction = 0;
  2437. DRFLAC_ASSERT(order <= 32);
  2438. /* 32-bit version. */
  2439. /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
  2440. switch (order)
  2441. {
  2442. case 32: prediction += coefficients[31] * pDecodedSamples[-32];
  2443. case 31: prediction += coefficients[30] * pDecodedSamples[-31];
  2444. case 30: prediction += coefficients[29] * pDecodedSamples[-30];
  2445. case 29: prediction += coefficients[28] * pDecodedSamples[-29];
  2446. case 28: prediction += coefficients[27] * pDecodedSamples[-28];
  2447. case 27: prediction += coefficients[26] * pDecodedSamples[-27];
  2448. case 26: prediction += coefficients[25] * pDecodedSamples[-26];
  2449. case 25: prediction += coefficients[24] * pDecodedSamples[-25];
  2450. case 24: prediction += coefficients[23] * pDecodedSamples[-24];
  2451. case 23: prediction += coefficients[22] * pDecodedSamples[-23];
  2452. case 22: prediction += coefficients[21] * pDecodedSamples[-22];
  2453. case 21: prediction += coefficients[20] * pDecodedSamples[-21];
  2454. case 20: prediction += coefficients[19] * pDecodedSamples[-20];
  2455. case 19: prediction += coefficients[18] * pDecodedSamples[-19];
  2456. case 18: prediction += coefficients[17] * pDecodedSamples[-18];
  2457. case 17: prediction += coefficients[16] * pDecodedSamples[-17];
  2458. case 16: prediction += coefficients[15] * pDecodedSamples[-16];
  2459. case 15: prediction += coefficients[14] * pDecodedSamples[-15];
  2460. case 14: prediction += coefficients[13] * pDecodedSamples[-14];
  2461. case 13: prediction += coefficients[12] * pDecodedSamples[-13];
  2462. case 12: prediction += coefficients[11] * pDecodedSamples[-12];
  2463. case 11: prediction += coefficients[10] * pDecodedSamples[-11];
  2464. case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
  2465. case 9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
  2466. case 8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
  2467. case 7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
  2468. case 6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
  2469. case 5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
  2470. case 4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
  2471. case 3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
  2472. case 2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
  2473. case 1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
  2474. }
  2475. return (drflac_int32)(prediction >> shift);
  2476. }
  2477. static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
  2478. {
  2479. drflac_int64 prediction;
  2480. DRFLAC_ASSERT(order <= 32);
  2481. /* 64-bit version. */
  2482. /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
  2483. #ifndef DRFLAC_64BIT
  2484. if (order == 8)
  2485. {
  2486. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2487. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2488. prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
  2489. prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
  2490. prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
  2491. prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
  2492. prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
  2493. prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
  2494. }
  2495. else if (order == 7)
  2496. {
  2497. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2498. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2499. prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
  2500. prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
  2501. prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
  2502. prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
  2503. prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
  2504. }
  2505. else if (order == 3)
  2506. {
  2507. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2508. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2509. prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
  2510. }
  2511. else if (order == 6)
  2512. {
  2513. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2514. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2515. prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
  2516. prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
  2517. prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
  2518. prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
  2519. }
  2520. else if (order == 5)
  2521. {
  2522. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2523. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2524. prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
  2525. prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
  2526. prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
  2527. }
  2528. else if (order == 4)
  2529. {
  2530. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2531. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2532. prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
  2533. prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
  2534. }
  2535. else if (order == 12)
  2536. {
  2537. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2538. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2539. prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
  2540. prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
  2541. prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
  2542. prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
  2543. prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
  2544. prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
  2545. prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
  2546. prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
  2547. prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
  2548. prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
  2549. }
  2550. else if (order == 2)
  2551. {
  2552. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2553. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2554. }
  2555. else if (order == 1)
  2556. {
  2557. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2558. }
  2559. else if (order == 10)
  2560. {
  2561. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2562. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2563. prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
  2564. prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
  2565. prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
  2566. prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
  2567. prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
  2568. prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
  2569. prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
  2570. prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
  2571. }
  2572. else if (order == 9)
  2573. {
  2574. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2575. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2576. prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
  2577. prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
  2578. prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
  2579. prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
  2580. prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
  2581. prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
  2582. prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
  2583. }
  2584. else if (order == 11)
  2585. {
  2586. prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
  2587. prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
  2588. prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
  2589. prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
  2590. prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
  2591. prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
  2592. prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
  2593. prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
  2594. prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
  2595. prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
  2596. prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
  2597. }
  2598. else
  2599. {
  2600. int j;
  2601. prediction = 0;
  2602. for (j = 0; j < (int)order; ++j) {
  2603. prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
  2604. }
  2605. }
  2606. #endif
  2607. /*
  2608. VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
  2609. reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
  2610. */
  2611. #ifdef DRFLAC_64BIT
  2612. prediction = 0;
  2613. switch (order)
  2614. {
  2615. case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
  2616. case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
  2617. case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
  2618. case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
  2619. case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
  2620. case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
  2621. case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
  2622. case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
  2623. case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
  2624. case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
  2625. case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
  2626. case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
  2627. case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
  2628. case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
  2629. case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
  2630. case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
  2631. case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
  2632. case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
  2633. case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
  2634. case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
  2635. case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
  2636. case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
  2637. case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
  2638. case 9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
  2639. case 8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
  2640. case 7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
  2641. case 6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
  2642. case 5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
  2643. case 4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
  2644. case 3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
  2645. case 2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
  2646. case 1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
  2647. }
  2648. #endif
  2649. return (drflac_int32)(prediction >> shift);
  2650. }
  2651. #if 0
  2652. /*
  2653. Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
  2654. sake of readability and should only be used as a reference.
  2655. */
  2656. static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  2657. {
  2658. drflac_uint32 i;
  2659. DRFLAC_ASSERT(bs != NULL);
  2660. DRFLAC_ASSERT(pSamplesOut != NULL);
  2661. for (i = 0; i < count; ++i) {
  2662. drflac_uint32 zeroCounter = 0;
  2663. for (;;) {
  2664. drflac_uint8 bit;
  2665. if (!drflac__read_uint8(bs, 1, &bit)) {
  2666. return DRFLAC_FALSE;
  2667. }
  2668. if (bit == 0) {
  2669. zeroCounter += 1;
  2670. } else {
  2671. break;
  2672. }
  2673. }
  2674. drflac_uint32 decodedRice;
  2675. if (riceParam > 0) {
  2676. if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
  2677. return DRFLAC_FALSE;
  2678. }
  2679. } else {
  2680. decodedRice = 0;
  2681. }
  2682. decodedRice |= (zeroCounter << riceParam);
  2683. if ((decodedRice & 0x01)) {
  2684. decodedRice = ~(decodedRice >> 1);
  2685. } else {
  2686. decodedRice = (decodedRice >> 1);
  2687. }
  2688. if (bitsPerSample+shift >= 32) {
  2689. pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + i);
  2690. } else {
  2691. pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + i);
  2692. }
  2693. }
  2694. return DRFLAC_TRUE;
  2695. }
  2696. #endif
  2697. #if 0
  2698. static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
  2699. {
  2700. drflac_uint32 zeroCounter = 0;
  2701. drflac_uint32 decodedRice;
  2702. for (;;) {
  2703. drflac_uint8 bit;
  2704. if (!drflac__read_uint8(bs, 1, &bit)) {
  2705. return DRFLAC_FALSE;
  2706. }
  2707. if (bit == 0) {
  2708. zeroCounter += 1;
  2709. } else {
  2710. break;
  2711. }
  2712. }
  2713. if (riceParam > 0) {
  2714. if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
  2715. return DRFLAC_FALSE;
  2716. }
  2717. } else {
  2718. decodedRice = 0;
  2719. }
  2720. *pZeroCounterOut = zeroCounter;
  2721. *pRiceParamPartOut = decodedRice;
  2722. return DRFLAC_TRUE;
  2723. }
  2724. #endif
  2725. #if 0
  2726. static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
  2727. {
  2728. drflac_cache_t riceParamMask;
  2729. drflac_uint32 zeroCounter;
  2730. drflac_uint32 setBitOffsetPlus1;
  2731. drflac_uint32 riceParamPart;
  2732. drflac_uint32 riceLength;
  2733. DRFLAC_ASSERT(riceParam > 0); /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
  2734. riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
  2735. zeroCounter = 0;
  2736. while (bs->cache == 0) {
  2737. zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
  2738. if (!drflac__reload_cache(bs)) {
  2739. return DRFLAC_FALSE;
  2740. }
  2741. }
  2742. setBitOffsetPlus1 = drflac__clz(bs->cache);
  2743. zeroCounter += setBitOffsetPlus1;
  2744. setBitOffsetPlus1 += 1;
  2745. riceLength = setBitOffsetPlus1 + riceParam;
  2746. if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
  2747. riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
  2748. bs->consumedBits += riceLength;
  2749. bs->cache <<= riceLength;
  2750. } else {
  2751. drflac_uint32 bitCountLo;
  2752. drflac_cache_t resultHi;
  2753. bs->consumedBits += riceLength;
  2754. bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1); /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
  2755. /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
  2756. bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
  2757. resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam); /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
  2758. if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
  2759. #ifndef DR_FLAC_NO_CRC
  2760. drflac__update_crc16(bs);
  2761. #endif
  2762. bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
  2763. bs->consumedBits = 0;
  2764. #ifndef DR_FLAC_NO_CRC
  2765. bs->crc16Cache = bs->cache;
  2766. #endif
  2767. } else {
  2768. /* Slow path. We need to fetch more data from the client. */
  2769. if (!drflac__reload_cache(bs)) {
  2770. return DRFLAC_FALSE;
  2771. }
  2772. }
  2773. riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
  2774. bs->consumedBits += bitCountLo;
  2775. bs->cache <<= bitCountLo;
  2776. }
  2777. pZeroCounterOut[0] = zeroCounter;
  2778. pRiceParamPartOut[0] = riceParamPart;
  2779. return DRFLAC_TRUE;
  2780. }
  2781. #endif
  2782. static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
  2783. {
  2784. drflac_uint32 riceParamPlus1 = riceParam + 1;
  2785. /*drflac_cache_t riceParamPlus1Mask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
  2786. drflac_uint32 riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
  2787. drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
  2788. /*
  2789. The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
  2790. no idea how this will work in practice...
  2791. */
  2792. drflac_cache_t bs_cache = bs->cache;
  2793. drflac_uint32 bs_consumedBits = bs->consumedBits;
  2794. /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
  2795. drflac_uint32 lzcount = drflac__clz(bs_cache);
  2796. if (lzcount < sizeof(bs_cache)*8) {
  2797. pZeroCounterOut[0] = lzcount;
  2798. /*
  2799. It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
  2800. this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
  2801. outside of this function at a higher level.
  2802. */
  2803. extract_rice_param_part:
  2804. bs_cache <<= lzcount;
  2805. bs_consumedBits += lzcount;
  2806. if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
  2807. /* Getting here means the rice parameter part is wholly contained within the current cache line. */
  2808. pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
  2809. bs_cache <<= riceParamPlus1;
  2810. bs_consumedBits += riceParamPlus1;
  2811. } else {
  2812. drflac_uint32 riceParamPartHi;
  2813. drflac_uint32 riceParamPartLo;
  2814. drflac_uint32 riceParamPartLoBitCount;
  2815. /*
  2816. Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
  2817. line, reload the cache, and then combine it with the head of the next cache line.
  2818. */
  2819. /* Grab the high part of the rice parameter part. */
  2820. riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
  2821. /* Before reloading the cache we need to grab the size in bits of the low part. */
  2822. riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
  2823. DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
  2824. /* Now reload the cache. */
  2825. if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
  2826. #ifndef DR_FLAC_NO_CRC
  2827. drflac__update_crc16(bs);
  2828. #endif
  2829. bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
  2830. bs_consumedBits = riceParamPartLoBitCount;
  2831. #ifndef DR_FLAC_NO_CRC
  2832. bs->crc16Cache = bs_cache;
  2833. #endif
  2834. } else {
  2835. /* Slow path. We need to fetch more data from the client. */
  2836. if (!drflac__reload_cache(bs)) {
  2837. return DRFLAC_FALSE;
  2838. }
  2839. bs_cache = bs->cache;
  2840. bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
  2841. }
  2842. /* We should now have enough information to construct the rice parameter part. */
  2843. riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
  2844. pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
  2845. bs_cache <<= riceParamPartLoBitCount;
  2846. }
  2847. } else {
  2848. /*
  2849. Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
  2850. to drflac__clz() and we need to reload the cache.
  2851. */
  2852. drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
  2853. for (;;) {
  2854. if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
  2855. #ifndef DR_FLAC_NO_CRC
  2856. drflac__update_crc16(bs);
  2857. #endif
  2858. bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
  2859. bs_consumedBits = 0;
  2860. #ifndef DR_FLAC_NO_CRC
  2861. bs->crc16Cache = bs_cache;
  2862. #endif
  2863. } else {
  2864. /* Slow path. We need to fetch more data from the client. */
  2865. if (!drflac__reload_cache(bs)) {
  2866. return DRFLAC_FALSE;
  2867. }
  2868. bs_cache = bs->cache;
  2869. bs_consumedBits = bs->consumedBits;
  2870. }
  2871. lzcount = drflac__clz(bs_cache);
  2872. zeroCounter += lzcount;
  2873. if (lzcount < sizeof(bs_cache)*8) {
  2874. break;
  2875. }
  2876. }
  2877. pZeroCounterOut[0] = zeroCounter;
  2878. goto extract_rice_param_part;
  2879. }
  2880. /* Make sure the cache is restored at the end of it all. */
  2881. bs->cache = bs_cache;
  2882. bs->consumedBits = bs_consumedBits;
  2883. return DRFLAC_TRUE;
  2884. }
  2885. static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
  2886. {
  2887. drflac_uint32 riceParamPlus1 = riceParam + 1;
  2888. drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
  2889. /*
  2890. The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
  2891. no idea how this will work in practice...
  2892. */
  2893. drflac_cache_t bs_cache = bs->cache;
  2894. drflac_uint32 bs_consumedBits = bs->consumedBits;
  2895. /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
  2896. drflac_uint32 lzcount = drflac__clz(bs_cache);
  2897. if (lzcount < sizeof(bs_cache)*8) {
  2898. /*
  2899. It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
  2900. this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
  2901. outside of this function at a higher level.
  2902. */
  2903. extract_rice_param_part:
  2904. bs_cache <<= lzcount;
  2905. bs_consumedBits += lzcount;
  2906. if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
  2907. /* Getting here means the rice parameter part is wholly contained within the current cache line. */
  2908. bs_cache <<= riceParamPlus1;
  2909. bs_consumedBits += riceParamPlus1;
  2910. } else {
  2911. /*
  2912. Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
  2913. line, reload the cache, and then combine it with the head of the next cache line.
  2914. */
  2915. /* Before reloading the cache we need to grab the size in bits of the low part. */
  2916. drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
  2917. DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
  2918. /* Now reload the cache. */
  2919. if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
  2920. #ifndef DR_FLAC_NO_CRC
  2921. drflac__update_crc16(bs);
  2922. #endif
  2923. bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
  2924. bs_consumedBits = riceParamPartLoBitCount;
  2925. #ifndef DR_FLAC_NO_CRC
  2926. bs->crc16Cache = bs_cache;
  2927. #endif
  2928. } else {
  2929. /* Slow path. We need to fetch more data from the client. */
  2930. if (!drflac__reload_cache(bs)) {
  2931. return DRFLAC_FALSE;
  2932. }
  2933. bs_cache = bs->cache;
  2934. bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
  2935. }
  2936. bs_cache <<= riceParamPartLoBitCount;
  2937. }
  2938. } else {
  2939. /*
  2940. Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
  2941. to drflac__clz() and we need to reload the cache.
  2942. */
  2943. for (;;) {
  2944. if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
  2945. #ifndef DR_FLAC_NO_CRC
  2946. drflac__update_crc16(bs);
  2947. #endif
  2948. bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
  2949. bs_consumedBits = 0;
  2950. #ifndef DR_FLAC_NO_CRC
  2951. bs->crc16Cache = bs_cache;
  2952. #endif
  2953. } else {
  2954. /* Slow path. We need to fetch more data from the client. */
  2955. if (!drflac__reload_cache(bs)) {
  2956. return DRFLAC_FALSE;
  2957. }
  2958. bs_cache = bs->cache;
  2959. bs_consumedBits = bs->consumedBits;
  2960. }
  2961. lzcount = drflac__clz(bs_cache);
  2962. if (lzcount < sizeof(bs_cache)*8) {
  2963. break;
  2964. }
  2965. }
  2966. goto extract_rice_param_part;
  2967. }
  2968. /* Make sure the cache is restored at the end of it all. */
  2969. bs->cache = bs_cache;
  2970. bs->consumedBits = bs_consumedBits;
  2971. return DRFLAC_TRUE;
  2972. }
  2973. static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  2974. {
  2975. drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
  2976. drflac_uint32 zeroCountPart0;
  2977. drflac_uint32 riceParamPart0;
  2978. drflac_uint32 riceParamMask;
  2979. drflac_uint32 i;
  2980. DRFLAC_ASSERT(bs != NULL);
  2981. DRFLAC_ASSERT(pSamplesOut != NULL);
  2982. (void)bitsPerSample;
  2983. (void)order;
  2984. (void)shift;
  2985. (void)coefficients;
  2986. riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
  2987. i = 0;
  2988. while (i < count) {
  2989. /* Rice extraction. */
  2990. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
  2991. return DRFLAC_FALSE;
  2992. }
  2993. /* Rice reconstruction. */
  2994. riceParamPart0 &= riceParamMask;
  2995. riceParamPart0 |= (zeroCountPart0 << riceParam);
  2996. riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
  2997. pSamplesOut[i] = riceParamPart0;
  2998. i += 1;
  2999. }
  3000. return DRFLAC_TRUE;
  3001. }
  3002. static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  3003. {
  3004. drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
  3005. drflac_uint32 zeroCountPart0 = 0;
  3006. drflac_uint32 zeroCountPart1 = 0;
  3007. drflac_uint32 zeroCountPart2 = 0;
  3008. drflac_uint32 zeroCountPart3 = 0;
  3009. drflac_uint32 riceParamPart0 = 0;
  3010. drflac_uint32 riceParamPart1 = 0;
  3011. drflac_uint32 riceParamPart2 = 0;
  3012. drflac_uint32 riceParamPart3 = 0;
  3013. drflac_uint32 riceParamMask;
  3014. const drflac_int32* pSamplesOutEnd;
  3015. drflac_uint32 i;
  3016. DRFLAC_ASSERT(bs != NULL);
  3017. DRFLAC_ASSERT(pSamplesOut != NULL);
  3018. if (order == 0) {
  3019. return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
  3020. }
  3021. riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
  3022. pSamplesOutEnd = pSamplesOut + (count & ~3);
  3023. if (bitsPerSample+shift > 32) {
  3024. while (pSamplesOut < pSamplesOutEnd) {
  3025. /*
  3026. Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
  3027. against an array. Not sure why, but perhaps it's making more efficient use of registers?
  3028. */
  3029. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
  3030. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
  3031. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
  3032. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
  3033. return DRFLAC_FALSE;
  3034. }
  3035. riceParamPart0 &= riceParamMask;
  3036. riceParamPart1 &= riceParamMask;
  3037. riceParamPart2 &= riceParamMask;
  3038. riceParamPart3 &= riceParamMask;
  3039. riceParamPart0 |= (zeroCountPart0 << riceParam);
  3040. riceParamPart1 |= (zeroCountPart1 << riceParam);
  3041. riceParamPart2 |= (zeroCountPart2 << riceParam);
  3042. riceParamPart3 |= (zeroCountPart3 << riceParam);
  3043. riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
  3044. riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
  3045. riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
  3046. riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
  3047. pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 0);
  3048. pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 1);
  3049. pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 2);
  3050. pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 3);
  3051. pSamplesOut += 4;
  3052. }
  3053. } else {
  3054. while (pSamplesOut < pSamplesOutEnd) {
  3055. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
  3056. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
  3057. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
  3058. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
  3059. return DRFLAC_FALSE;
  3060. }
  3061. riceParamPart0 &= riceParamMask;
  3062. riceParamPart1 &= riceParamMask;
  3063. riceParamPart2 &= riceParamMask;
  3064. riceParamPart3 &= riceParamMask;
  3065. riceParamPart0 |= (zeroCountPart0 << riceParam);
  3066. riceParamPart1 |= (zeroCountPart1 << riceParam);
  3067. riceParamPart2 |= (zeroCountPart2 << riceParam);
  3068. riceParamPart3 |= (zeroCountPart3 << riceParam);
  3069. riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
  3070. riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
  3071. riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
  3072. riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
  3073. pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 0);
  3074. pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 1);
  3075. pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 2);
  3076. pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 3);
  3077. pSamplesOut += 4;
  3078. }
  3079. }
  3080. i = (count & ~3);
  3081. while (i < count) {
  3082. /* Rice extraction. */
  3083. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
  3084. return DRFLAC_FALSE;
  3085. }
  3086. /* Rice reconstruction. */
  3087. riceParamPart0 &= riceParamMask;
  3088. riceParamPart0 |= (zeroCountPart0 << riceParam);
  3089. riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
  3090. /*riceParamPart0 = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
  3091. /* Sample reconstruction. */
  3092. if (bitsPerSample+shift > 32) {
  3093. pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 0);
  3094. } else {
  3095. pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 0);
  3096. }
  3097. i += 1;
  3098. pSamplesOut += 1;
  3099. }
  3100. return DRFLAC_TRUE;
  3101. }
  3102. #if defined(DRFLAC_SUPPORT_SSE2)
  3103. static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
  3104. {
  3105. __m128i r;
  3106. /* Pack. */
  3107. r = _mm_packs_epi32(a, b);
  3108. /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
  3109. r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
  3110. /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
  3111. r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
  3112. r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
  3113. return r;
  3114. }
  3115. #endif
  3116. #if defined(DRFLAC_SUPPORT_SSE41)
  3117. static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
  3118. {
  3119. return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
  3120. }
  3121. static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
  3122. {
  3123. __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
  3124. __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
  3125. return _mm_add_epi32(x64, x32);
  3126. }
  3127. static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
  3128. {
  3129. return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
  3130. }
  3131. static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
  3132. {
  3133. /*
  3134. To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
  3135. is shifted with zero bits, whereas the right side is shifted with sign bits.
  3136. */
  3137. __m128i lo = _mm_srli_epi64(x, count);
  3138. __m128i hi = _mm_srai_epi32(x, count);
  3139. hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0)); /* The high part needs to have the low part cleared. */
  3140. return _mm_or_si128(lo, hi);
  3141. }
  3142. static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  3143. {
  3144. int i;
  3145. drflac_uint32 riceParamMask;
  3146. drflac_int32* pDecodedSamples = pSamplesOut;
  3147. drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
  3148. drflac_uint32 zeroCountParts0 = 0;
  3149. drflac_uint32 zeroCountParts1 = 0;
  3150. drflac_uint32 zeroCountParts2 = 0;
  3151. drflac_uint32 zeroCountParts3 = 0;
  3152. drflac_uint32 riceParamParts0 = 0;
  3153. drflac_uint32 riceParamParts1 = 0;
  3154. drflac_uint32 riceParamParts2 = 0;
  3155. drflac_uint32 riceParamParts3 = 0;
  3156. __m128i coefficients128_0;
  3157. __m128i coefficients128_4;
  3158. __m128i coefficients128_8;
  3159. __m128i samples128_0;
  3160. __m128i samples128_4;
  3161. __m128i samples128_8;
  3162. __m128i riceParamMask128;
  3163. const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
  3164. riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
  3165. riceParamMask128 = _mm_set1_epi32(riceParamMask);
  3166. /* Pre-load. */
  3167. coefficients128_0 = _mm_setzero_si128();
  3168. coefficients128_4 = _mm_setzero_si128();
  3169. coefficients128_8 = _mm_setzero_si128();
  3170. samples128_0 = _mm_setzero_si128();
  3171. samples128_4 = _mm_setzero_si128();
  3172. samples128_8 = _mm_setzero_si128();
  3173. /*
  3174. Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
  3175. what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
  3176. in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
  3177. so I think there's opportunity for this to be simplified.
  3178. */
  3179. #if 1
  3180. {
  3181. int runningOrder = order;
  3182. /* 0 - 3. */
  3183. if (runningOrder >= 4) {
  3184. coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
  3185. samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
  3186. runningOrder -= 4;
  3187. } else {
  3188. switch (runningOrder) {
  3189. case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
  3190. case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
  3191. case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
  3192. }
  3193. runningOrder = 0;
  3194. }
  3195. /* 4 - 7 */
  3196. if (runningOrder >= 4) {
  3197. coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
  3198. samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
  3199. runningOrder -= 4;
  3200. } else {
  3201. switch (runningOrder) {
  3202. case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
  3203. case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
  3204. case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
  3205. }
  3206. runningOrder = 0;
  3207. }
  3208. /* 8 - 11 */
  3209. if (runningOrder == 4) {
  3210. coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
  3211. samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
  3212. runningOrder -= 4;
  3213. } else {
  3214. switch (runningOrder) {
  3215. case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
  3216. case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
  3217. case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
  3218. }
  3219. runningOrder = 0;
  3220. }
  3221. /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
  3222. coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
  3223. coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
  3224. coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
  3225. }
  3226. #else
  3227. /* This causes strict-aliasing warnings with GCC. */
  3228. switch (order)
  3229. {
  3230. case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
  3231. case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
  3232. case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
  3233. case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
  3234. case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
  3235. case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
  3236. case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
  3237. case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
  3238. case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
  3239. case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
  3240. case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
  3241. case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
  3242. }
  3243. #endif
  3244. /* For this version we are doing one sample at a time. */
  3245. while (pDecodedSamples < pDecodedSamplesEnd) {
  3246. __m128i prediction128;
  3247. __m128i zeroCountPart128;
  3248. __m128i riceParamPart128;
  3249. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
  3250. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
  3251. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
  3252. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
  3253. return DRFLAC_FALSE;
  3254. }
  3255. zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
  3256. riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
  3257. riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
  3258. riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
  3259. riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01))); /* <-- SSE2 compatible */
  3260. /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/ /* <-- Only supported from SSE4.1 and is slower in my testing... */
  3261. if (order <= 4) {
  3262. for (i = 0; i < 4; i += 1) {
  3263. prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
  3264. /* Horizontal add and shift. */
  3265. prediction128 = drflac__mm_hadd_epi32(prediction128);
  3266. prediction128 = _mm_srai_epi32(prediction128, shift);
  3267. prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
  3268. samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
  3269. riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
  3270. }
  3271. } else if (order <= 8) {
  3272. for (i = 0; i < 4; i += 1) {
  3273. prediction128 = _mm_mullo_epi32(coefficients128_4, samples128_4);
  3274. prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
  3275. /* Horizontal add and shift. */
  3276. prediction128 = drflac__mm_hadd_epi32(prediction128);
  3277. prediction128 = _mm_srai_epi32(prediction128, shift);
  3278. prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
  3279. samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
  3280. samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
  3281. riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
  3282. }
  3283. } else {
  3284. for (i = 0; i < 4; i += 1) {
  3285. prediction128 = _mm_mullo_epi32(coefficients128_8, samples128_8);
  3286. prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
  3287. prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
  3288. /* Horizontal add and shift. */
  3289. prediction128 = drflac__mm_hadd_epi32(prediction128);
  3290. prediction128 = _mm_srai_epi32(prediction128, shift);
  3291. prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
  3292. samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
  3293. samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
  3294. samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
  3295. riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
  3296. }
  3297. }
  3298. /* We store samples in groups of 4. */
  3299. _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
  3300. pDecodedSamples += 4;
  3301. }
  3302. /* Make sure we process the last few samples. */
  3303. i = (count & ~3);
  3304. while (i < (int)count) {
  3305. /* Rice extraction. */
  3306. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
  3307. return DRFLAC_FALSE;
  3308. }
  3309. /* Rice reconstruction. */
  3310. riceParamParts0 &= riceParamMask;
  3311. riceParamParts0 |= (zeroCountParts0 << riceParam);
  3312. riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
  3313. /* Sample reconstruction. */
  3314. pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
  3315. i += 1;
  3316. pDecodedSamples += 1;
  3317. }
  3318. return DRFLAC_TRUE;
  3319. }
  3320. static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  3321. {
  3322. int i;
  3323. drflac_uint32 riceParamMask;
  3324. drflac_int32* pDecodedSamples = pSamplesOut;
  3325. drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
  3326. drflac_uint32 zeroCountParts0 = 0;
  3327. drflac_uint32 zeroCountParts1 = 0;
  3328. drflac_uint32 zeroCountParts2 = 0;
  3329. drflac_uint32 zeroCountParts3 = 0;
  3330. drflac_uint32 riceParamParts0 = 0;
  3331. drflac_uint32 riceParamParts1 = 0;
  3332. drflac_uint32 riceParamParts2 = 0;
  3333. drflac_uint32 riceParamParts3 = 0;
  3334. __m128i coefficients128_0;
  3335. __m128i coefficients128_4;
  3336. __m128i coefficients128_8;
  3337. __m128i samples128_0;
  3338. __m128i samples128_4;
  3339. __m128i samples128_8;
  3340. __m128i prediction128;
  3341. __m128i riceParamMask128;
  3342. const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
  3343. DRFLAC_ASSERT(order <= 12);
  3344. riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
  3345. riceParamMask128 = _mm_set1_epi32(riceParamMask);
  3346. prediction128 = _mm_setzero_si128();
  3347. /* Pre-load. */
  3348. coefficients128_0 = _mm_setzero_si128();
  3349. coefficients128_4 = _mm_setzero_si128();
  3350. coefficients128_8 = _mm_setzero_si128();
  3351. samples128_0 = _mm_setzero_si128();
  3352. samples128_4 = _mm_setzero_si128();
  3353. samples128_8 = _mm_setzero_si128();
  3354. #if 1
  3355. {
  3356. int runningOrder = order;
  3357. /* 0 - 3. */
  3358. if (runningOrder >= 4) {
  3359. coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
  3360. samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
  3361. runningOrder -= 4;
  3362. } else {
  3363. switch (runningOrder) {
  3364. case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
  3365. case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
  3366. case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
  3367. }
  3368. runningOrder = 0;
  3369. }
  3370. /* 4 - 7 */
  3371. if (runningOrder >= 4) {
  3372. coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
  3373. samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
  3374. runningOrder -= 4;
  3375. } else {
  3376. switch (runningOrder) {
  3377. case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
  3378. case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
  3379. case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
  3380. }
  3381. runningOrder = 0;
  3382. }
  3383. /* 8 - 11 */
  3384. if (runningOrder == 4) {
  3385. coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
  3386. samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
  3387. runningOrder -= 4;
  3388. } else {
  3389. switch (runningOrder) {
  3390. case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
  3391. case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
  3392. case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
  3393. }
  3394. runningOrder = 0;
  3395. }
  3396. /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
  3397. coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
  3398. coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
  3399. coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
  3400. }
  3401. #else
  3402. switch (order)
  3403. {
  3404. case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
  3405. case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
  3406. case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
  3407. case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
  3408. case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
  3409. case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
  3410. case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
  3411. case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
  3412. case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
  3413. case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
  3414. case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
  3415. case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
  3416. }
  3417. #endif
  3418. /* For this version we are doing one sample at a time. */
  3419. while (pDecodedSamples < pDecodedSamplesEnd) {
  3420. __m128i zeroCountPart128;
  3421. __m128i riceParamPart128;
  3422. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
  3423. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
  3424. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
  3425. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
  3426. return DRFLAC_FALSE;
  3427. }
  3428. zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
  3429. riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
  3430. riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
  3431. riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
  3432. riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
  3433. for (i = 0; i < 4; i += 1) {
  3434. prediction128 = _mm_xor_si128(prediction128, prediction128); /* Reset to 0. */
  3435. switch (order)
  3436. {
  3437. case 12:
  3438. case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
  3439. case 10:
  3440. case 9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
  3441. case 8:
  3442. case 7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
  3443. case 6:
  3444. case 5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
  3445. case 4:
  3446. case 3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
  3447. case 2:
  3448. case 1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
  3449. }
  3450. /* Horizontal add and shift. */
  3451. prediction128 = drflac__mm_hadd_epi64(prediction128);
  3452. prediction128 = drflac__mm_srai_epi64(prediction128, shift);
  3453. prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
  3454. /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
  3455. samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
  3456. samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
  3457. samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
  3458. /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
  3459. riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
  3460. }
  3461. /* We store samples in groups of 4. */
  3462. _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
  3463. pDecodedSamples += 4;
  3464. }
  3465. /* Make sure we process the last few samples. */
  3466. i = (count & ~3);
  3467. while (i < (int)count) {
  3468. /* Rice extraction. */
  3469. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
  3470. return DRFLAC_FALSE;
  3471. }
  3472. /* Rice reconstruction. */
  3473. riceParamParts0 &= riceParamMask;
  3474. riceParamParts0 |= (zeroCountParts0 << riceParam);
  3475. riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
  3476. /* Sample reconstruction. */
  3477. pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
  3478. i += 1;
  3479. pDecodedSamples += 1;
  3480. }
  3481. return DRFLAC_TRUE;
  3482. }
  3483. static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  3484. {
  3485. DRFLAC_ASSERT(bs != NULL);
  3486. DRFLAC_ASSERT(pSamplesOut != NULL);
  3487. /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
  3488. if (order > 0 && order <= 12) {
  3489. if (bitsPerSample+shift > 32) {
  3490. return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
  3491. } else {
  3492. return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
  3493. }
  3494. } else {
  3495. return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
  3496. }
  3497. }
  3498. #endif
  3499. #if defined(DRFLAC_SUPPORT_NEON)
  3500. static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
  3501. {
  3502. vst1q_s32(p+0, x.val[0]);
  3503. vst1q_s32(p+4, x.val[1]);
  3504. }
  3505. static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
  3506. {
  3507. vst1q_u32(p+0, x.val[0]);
  3508. vst1q_u32(p+4, x.val[1]);
  3509. }
  3510. static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
  3511. {
  3512. vst1q_f32(p+0, x.val[0]);
  3513. vst1q_f32(p+4, x.val[1]);
  3514. }
  3515. static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
  3516. {
  3517. vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
  3518. }
  3519. static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
  3520. {
  3521. vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
  3522. }
  3523. static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
  3524. {
  3525. drflac_int32 x[4];
  3526. x[3] = x3;
  3527. x[2] = x2;
  3528. x[1] = x1;
  3529. x[0] = x0;
  3530. return vld1q_s32(x);
  3531. }
  3532. static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
  3533. {
  3534. /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
  3535. /* Reference */
  3536. /*return drflac__vdupq_n_s32x4(
  3537. vgetq_lane_s32(a, 0),
  3538. vgetq_lane_s32(b, 3),
  3539. vgetq_lane_s32(b, 2),
  3540. vgetq_lane_s32(b, 1)
  3541. );*/
  3542. return vextq_s32(b, a, 1);
  3543. }
  3544. static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
  3545. {
  3546. /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
  3547. /* Reference */
  3548. /*return drflac__vdupq_n_s32x4(
  3549. vgetq_lane_s32(a, 0),
  3550. vgetq_lane_s32(b, 3),
  3551. vgetq_lane_s32(b, 2),
  3552. vgetq_lane_s32(b, 1)
  3553. );*/
  3554. return vextq_u32(b, a, 1);
  3555. }
  3556. static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
  3557. {
  3558. /* The sum must end up in position 0. */
  3559. /* Reference */
  3560. /*return vdupq_n_s32(
  3561. vgetq_lane_s32(x, 3) +
  3562. vgetq_lane_s32(x, 2) +
  3563. vgetq_lane_s32(x, 1) +
  3564. vgetq_lane_s32(x, 0)
  3565. );*/
  3566. int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
  3567. return vpadd_s32(r, r);
  3568. }
  3569. static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
  3570. {
  3571. return vadd_s64(vget_high_s64(x), vget_low_s64(x));
  3572. }
  3573. static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
  3574. {
  3575. /* Reference */
  3576. /*return drflac__vdupq_n_s32x4(
  3577. vgetq_lane_s32(x, 0),
  3578. vgetq_lane_s32(x, 1),
  3579. vgetq_lane_s32(x, 2),
  3580. vgetq_lane_s32(x, 3)
  3581. );*/
  3582. return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
  3583. }
  3584. static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
  3585. {
  3586. return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
  3587. }
  3588. static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
  3589. {
  3590. return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
  3591. }
  3592. static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  3593. {
  3594. int i;
  3595. drflac_uint32 riceParamMask;
  3596. drflac_int32* pDecodedSamples = pSamplesOut;
  3597. drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
  3598. drflac_uint32 zeroCountParts[4];
  3599. drflac_uint32 riceParamParts[4];
  3600. int32x4_t coefficients128_0;
  3601. int32x4_t coefficients128_4;
  3602. int32x4_t coefficients128_8;
  3603. int32x4_t samples128_0;
  3604. int32x4_t samples128_4;
  3605. int32x4_t samples128_8;
  3606. uint32x4_t riceParamMask128;
  3607. int32x4_t riceParam128;
  3608. int32x2_t shift64;
  3609. uint32x4_t one128;
  3610. const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
  3611. riceParamMask = ~((~0UL) << riceParam);
  3612. riceParamMask128 = vdupq_n_u32(riceParamMask);
  3613. riceParam128 = vdupq_n_s32(riceParam);
  3614. shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
  3615. one128 = vdupq_n_u32(1);
  3616. /*
  3617. Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
  3618. what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
  3619. in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
  3620. so I think there's opportunity for this to be simplified.
  3621. */
  3622. {
  3623. int runningOrder = order;
  3624. drflac_int32 tempC[4] = {0, 0, 0, 0};
  3625. drflac_int32 tempS[4] = {0, 0, 0, 0};
  3626. /* 0 - 3. */
  3627. if (runningOrder >= 4) {
  3628. coefficients128_0 = vld1q_s32(coefficients + 0);
  3629. samples128_0 = vld1q_s32(pSamplesOut - 4);
  3630. runningOrder -= 4;
  3631. } else {
  3632. switch (runningOrder) {
  3633. case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
  3634. case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
  3635. case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
  3636. }
  3637. coefficients128_0 = vld1q_s32(tempC);
  3638. samples128_0 = vld1q_s32(tempS);
  3639. runningOrder = 0;
  3640. }
  3641. /* 4 - 7 */
  3642. if (runningOrder >= 4) {
  3643. coefficients128_4 = vld1q_s32(coefficients + 4);
  3644. samples128_4 = vld1q_s32(pSamplesOut - 8);
  3645. runningOrder -= 4;
  3646. } else {
  3647. switch (runningOrder) {
  3648. case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
  3649. case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
  3650. case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
  3651. }
  3652. coefficients128_4 = vld1q_s32(tempC);
  3653. samples128_4 = vld1q_s32(tempS);
  3654. runningOrder = 0;
  3655. }
  3656. /* 8 - 11 */
  3657. if (runningOrder == 4) {
  3658. coefficients128_8 = vld1q_s32(coefficients + 8);
  3659. samples128_8 = vld1q_s32(pSamplesOut - 12);
  3660. runningOrder -= 4;
  3661. } else {
  3662. switch (runningOrder) {
  3663. case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
  3664. case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
  3665. case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
  3666. }
  3667. coefficients128_8 = vld1q_s32(tempC);
  3668. samples128_8 = vld1q_s32(tempS);
  3669. runningOrder = 0;
  3670. }
  3671. /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
  3672. coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
  3673. coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
  3674. coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
  3675. }
  3676. /* For this version we are doing one sample at a time. */
  3677. while (pDecodedSamples < pDecodedSamplesEnd) {
  3678. int32x4_t prediction128;
  3679. int32x2_t prediction64;
  3680. uint32x4_t zeroCountPart128;
  3681. uint32x4_t riceParamPart128;
  3682. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
  3683. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
  3684. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
  3685. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
  3686. return DRFLAC_FALSE;
  3687. }
  3688. zeroCountPart128 = vld1q_u32(zeroCountParts);
  3689. riceParamPart128 = vld1q_u32(riceParamParts);
  3690. riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
  3691. riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
  3692. riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
  3693. if (order <= 4) {
  3694. for (i = 0; i < 4; i += 1) {
  3695. prediction128 = vmulq_s32(coefficients128_0, samples128_0);
  3696. /* Horizontal add and shift. */
  3697. prediction64 = drflac__vhaddq_s32(prediction128);
  3698. prediction64 = vshl_s32(prediction64, shift64);
  3699. prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
  3700. samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
  3701. riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
  3702. }
  3703. } else if (order <= 8) {
  3704. for (i = 0; i < 4; i += 1) {
  3705. prediction128 = vmulq_s32(coefficients128_4, samples128_4);
  3706. prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
  3707. /* Horizontal add and shift. */
  3708. prediction64 = drflac__vhaddq_s32(prediction128);
  3709. prediction64 = vshl_s32(prediction64, shift64);
  3710. prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
  3711. samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
  3712. samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
  3713. riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
  3714. }
  3715. } else {
  3716. for (i = 0; i < 4; i += 1) {
  3717. prediction128 = vmulq_s32(coefficients128_8, samples128_8);
  3718. prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
  3719. prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
  3720. /* Horizontal add and shift. */
  3721. prediction64 = drflac__vhaddq_s32(prediction128);
  3722. prediction64 = vshl_s32(prediction64, shift64);
  3723. prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
  3724. samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
  3725. samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
  3726. samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
  3727. riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
  3728. }
  3729. }
  3730. /* We store samples in groups of 4. */
  3731. vst1q_s32(pDecodedSamples, samples128_0);
  3732. pDecodedSamples += 4;
  3733. }
  3734. /* Make sure we process the last few samples. */
  3735. i = (count & ~3);
  3736. while (i < (int)count) {
  3737. /* Rice extraction. */
  3738. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
  3739. return DRFLAC_FALSE;
  3740. }
  3741. /* Rice reconstruction. */
  3742. riceParamParts[0] &= riceParamMask;
  3743. riceParamParts[0] |= (zeroCountParts[0] << riceParam);
  3744. riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
  3745. /* Sample reconstruction. */
  3746. pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
  3747. i += 1;
  3748. pDecodedSamples += 1;
  3749. }
  3750. return DRFLAC_TRUE;
  3751. }
  3752. static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  3753. {
  3754. int i;
  3755. drflac_uint32 riceParamMask;
  3756. drflac_int32* pDecodedSamples = pSamplesOut;
  3757. drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
  3758. drflac_uint32 zeroCountParts[4];
  3759. drflac_uint32 riceParamParts[4];
  3760. int32x4_t coefficients128_0;
  3761. int32x4_t coefficients128_4;
  3762. int32x4_t coefficients128_8;
  3763. int32x4_t samples128_0;
  3764. int32x4_t samples128_4;
  3765. int32x4_t samples128_8;
  3766. uint32x4_t riceParamMask128;
  3767. int32x4_t riceParam128;
  3768. int64x1_t shift64;
  3769. uint32x4_t one128;
  3770. const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
  3771. riceParamMask = ~((~0UL) << riceParam);
  3772. riceParamMask128 = vdupq_n_u32(riceParamMask);
  3773. riceParam128 = vdupq_n_s32(riceParam);
  3774. shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
  3775. one128 = vdupq_n_u32(1);
  3776. /*
  3777. Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
  3778. what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
  3779. in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
  3780. so I think there's opportunity for this to be simplified.
  3781. */
  3782. {
  3783. int runningOrder = order;
  3784. drflac_int32 tempC[4] = {0, 0, 0, 0};
  3785. drflac_int32 tempS[4] = {0, 0, 0, 0};
  3786. /* 0 - 3. */
  3787. if (runningOrder >= 4) {
  3788. coefficients128_0 = vld1q_s32(coefficients + 0);
  3789. samples128_0 = vld1q_s32(pSamplesOut - 4);
  3790. runningOrder -= 4;
  3791. } else {
  3792. switch (runningOrder) {
  3793. case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
  3794. case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
  3795. case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
  3796. }
  3797. coefficients128_0 = vld1q_s32(tempC);
  3798. samples128_0 = vld1q_s32(tempS);
  3799. runningOrder = 0;
  3800. }
  3801. /* 4 - 7 */
  3802. if (runningOrder >= 4) {
  3803. coefficients128_4 = vld1q_s32(coefficients + 4);
  3804. samples128_4 = vld1q_s32(pSamplesOut - 8);
  3805. runningOrder -= 4;
  3806. } else {
  3807. switch (runningOrder) {
  3808. case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
  3809. case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
  3810. case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
  3811. }
  3812. coefficients128_4 = vld1q_s32(tempC);
  3813. samples128_4 = vld1q_s32(tempS);
  3814. runningOrder = 0;
  3815. }
  3816. /* 8 - 11 */
  3817. if (runningOrder == 4) {
  3818. coefficients128_8 = vld1q_s32(coefficients + 8);
  3819. samples128_8 = vld1q_s32(pSamplesOut - 12);
  3820. runningOrder -= 4;
  3821. } else {
  3822. switch (runningOrder) {
  3823. case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
  3824. case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
  3825. case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
  3826. }
  3827. coefficients128_8 = vld1q_s32(tempC);
  3828. samples128_8 = vld1q_s32(tempS);
  3829. runningOrder = 0;
  3830. }
  3831. /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
  3832. coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
  3833. coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
  3834. coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
  3835. }
  3836. /* For this version we are doing one sample at a time. */
  3837. while (pDecodedSamples < pDecodedSamplesEnd) {
  3838. int64x2_t prediction128;
  3839. uint32x4_t zeroCountPart128;
  3840. uint32x4_t riceParamPart128;
  3841. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
  3842. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
  3843. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
  3844. !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
  3845. return DRFLAC_FALSE;
  3846. }
  3847. zeroCountPart128 = vld1q_u32(zeroCountParts);
  3848. riceParamPart128 = vld1q_u32(riceParamParts);
  3849. riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
  3850. riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
  3851. riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
  3852. for (i = 0; i < 4; i += 1) {
  3853. int64x1_t prediction64;
  3854. prediction128 = veorq_s64(prediction128, prediction128); /* Reset to 0. */
  3855. switch (order)
  3856. {
  3857. case 12:
  3858. case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
  3859. case 10:
  3860. case 9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
  3861. case 8:
  3862. case 7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
  3863. case 6:
  3864. case 5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
  3865. case 4:
  3866. case 3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
  3867. case 2:
  3868. case 1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
  3869. }
  3870. /* Horizontal add and shift. */
  3871. prediction64 = drflac__vhaddq_s64(prediction128);
  3872. prediction64 = vshl_s64(prediction64, shift64);
  3873. prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
  3874. /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
  3875. samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
  3876. samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
  3877. samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
  3878. /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
  3879. riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
  3880. }
  3881. /* We store samples in groups of 4. */
  3882. vst1q_s32(pDecodedSamples, samples128_0);
  3883. pDecodedSamples += 4;
  3884. }
  3885. /* Make sure we process the last few samples. */
  3886. i = (count & ~3);
  3887. while (i < (int)count) {
  3888. /* Rice extraction. */
  3889. if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
  3890. return DRFLAC_FALSE;
  3891. }
  3892. /* Rice reconstruction. */
  3893. riceParamParts[0] &= riceParamMask;
  3894. riceParamParts[0] |= (zeroCountParts[0] << riceParam);
  3895. riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
  3896. /* Sample reconstruction. */
  3897. pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
  3898. i += 1;
  3899. pDecodedSamples += 1;
  3900. }
  3901. return DRFLAC_TRUE;
  3902. }
  3903. static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  3904. {
  3905. DRFLAC_ASSERT(bs != NULL);
  3906. DRFLAC_ASSERT(pSamplesOut != NULL);
  3907. /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
  3908. if (order > 0 && order <= 12) {
  3909. if (bitsPerSample+shift > 32) {
  3910. return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
  3911. } else {
  3912. return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
  3913. }
  3914. } else {
  3915. return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
  3916. }
  3917. }
  3918. #endif
  3919. static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  3920. {
  3921. #if defined(DRFLAC_SUPPORT_SSE41)
  3922. if (drflac__gIsSSE41Supported) {
  3923. return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
  3924. } else
  3925. #elif defined(DRFLAC_SUPPORT_NEON)
  3926. if (drflac__gIsNEONSupported) {
  3927. return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
  3928. } else
  3929. #endif
  3930. {
  3931. /* Scalar fallback. */
  3932. #if 0
  3933. return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
  3934. #else
  3935. return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
  3936. #endif
  3937. }
  3938. }
  3939. /* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
  3940. static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
  3941. {
  3942. drflac_uint32 i;
  3943. DRFLAC_ASSERT(bs != NULL);
  3944. for (i = 0; i < count; ++i) {
  3945. if (!drflac__seek_rice_parts(bs, riceParam)) {
  3946. return DRFLAC_FALSE;
  3947. }
  3948. }
  3949. return DRFLAC_TRUE;
  3950. }
  3951. static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
  3952. {
  3953. drflac_uint32 i;
  3954. DRFLAC_ASSERT(bs != NULL);
  3955. DRFLAC_ASSERT(unencodedBitsPerSample <= 31); /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
  3956. DRFLAC_ASSERT(pSamplesOut != NULL);
  3957. for (i = 0; i < count; ++i) {
  3958. if (unencodedBitsPerSample > 0) {
  3959. if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
  3960. return DRFLAC_FALSE;
  3961. }
  3962. } else {
  3963. pSamplesOut[i] = 0;
  3964. }
  3965. if (bitsPerSample >= 24) {
  3966. pSamplesOut[i] += drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + i);
  3967. } else {
  3968. pSamplesOut[i] += drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + i);
  3969. }
  3970. }
  3971. return DRFLAC_TRUE;
  3972. }
  3973. /*
  3974. Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
  3975. when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
  3976. <blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
  3977. */
  3978. static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
  3979. {
  3980. drflac_uint8 residualMethod;
  3981. drflac_uint8 partitionOrder;
  3982. drflac_uint32 samplesInPartition;
  3983. drflac_uint32 partitionsRemaining;
  3984. DRFLAC_ASSERT(bs != NULL);
  3985. DRFLAC_ASSERT(blockSize != 0);
  3986. DRFLAC_ASSERT(pDecodedSamples != NULL); /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
  3987. if (!drflac__read_uint8(bs, 2, &residualMethod)) {
  3988. return DRFLAC_FALSE;
  3989. }
  3990. if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
  3991. return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
  3992. }
  3993. /* Ignore the first <order> values. */
  3994. pDecodedSamples += order;
  3995. if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
  3996. return DRFLAC_FALSE;
  3997. }
  3998. /*
  3999. From the FLAC spec:
  4000. The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
  4001. */
  4002. if (partitionOrder > 8) {
  4003. return DRFLAC_FALSE;
  4004. }
  4005. /* Validation check. */
  4006. if ((blockSize / (1 << partitionOrder)) < order) {
  4007. return DRFLAC_FALSE;
  4008. }
  4009. samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
  4010. partitionsRemaining = (1 << partitionOrder);
  4011. for (;;) {
  4012. drflac_uint8 riceParam = 0;
  4013. if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
  4014. if (!drflac__read_uint8(bs, 4, &riceParam)) {
  4015. return DRFLAC_FALSE;
  4016. }
  4017. if (riceParam == 15) {
  4018. riceParam = 0xFF;
  4019. }
  4020. } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
  4021. if (!drflac__read_uint8(bs, 5, &riceParam)) {
  4022. return DRFLAC_FALSE;
  4023. }
  4024. if (riceParam == 31) {
  4025. riceParam = 0xFF;
  4026. }
  4027. }
  4028. if (riceParam != 0xFF) {
  4029. if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, order, shift, coefficients, pDecodedSamples)) {
  4030. return DRFLAC_FALSE;
  4031. }
  4032. } else {
  4033. drflac_uint8 unencodedBitsPerSample = 0;
  4034. if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
  4035. return DRFLAC_FALSE;
  4036. }
  4037. if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, order, shift, coefficients, pDecodedSamples)) {
  4038. return DRFLAC_FALSE;
  4039. }
  4040. }
  4041. pDecodedSamples += samplesInPartition;
  4042. if (partitionsRemaining == 1) {
  4043. break;
  4044. }
  4045. partitionsRemaining -= 1;
  4046. if (partitionOrder != 0) {
  4047. samplesInPartition = blockSize / (1 << partitionOrder);
  4048. }
  4049. }
  4050. return DRFLAC_TRUE;
  4051. }
  4052. /*
  4053. Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
  4054. when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
  4055. <blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
  4056. */
  4057. static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
  4058. {
  4059. drflac_uint8 residualMethod;
  4060. drflac_uint8 partitionOrder;
  4061. drflac_uint32 samplesInPartition;
  4062. drflac_uint32 partitionsRemaining;
  4063. DRFLAC_ASSERT(bs != NULL);
  4064. DRFLAC_ASSERT(blockSize != 0);
  4065. if (!drflac__read_uint8(bs, 2, &residualMethod)) {
  4066. return DRFLAC_FALSE;
  4067. }
  4068. if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
  4069. return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
  4070. }
  4071. if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
  4072. return DRFLAC_FALSE;
  4073. }
  4074. /*
  4075. From the FLAC spec:
  4076. The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
  4077. */
  4078. if (partitionOrder > 8) {
  4079. return DRFLAC_FALSE;
  4080. }
  4081. /* Validation check. */
  4082. if ((blockSize / (1 << partitionOrder)) <= order) {
  4083. return DRFLAC_FALSE;
  4084. }
  4085. samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
  4086. partitionsRemaining = (1 << partitionOrder);
  4087. for (;;)
  4088. {
  4089. drflac_uint8 riceParam = 0;
  4090. if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
  4091. if (!drflac__read_uint8(bs, 4, &riceParam)) {
  4092. return DRFLAC_FALSE;
  4093. }
  4094. if (riceParam == 15) {
  4095. riceParam = 0xFF;
  4096. }
  4097. } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
  4098. if (!drflac__read_uint8(bs, 5, &riceParam)) {
  4099. return DRFLAC_FALSE;
  4100. }
  4101. if (riceParam == 31) {
  4102. riceParam = 0xFF;
  4103. }
  4104. }
  4105. if (riceParam != 0xFF) {
  4106. if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
  4107. return DRFLAC_FALSE;
  4108. }
  4109. } else {
  4110. drflac_uint8 unencodedBitsPerSample = 0;
  4111. if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
  4112. return DRFLAC_FALSE;
  4113. }
  4114. if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
  4115. return DRFLAC_FALSE;
  4116. }
  4117. }
  4118. if (partitionsRemaining == 1) {
  4119. break;
  4120. }
  4121. partitionsRemaining -= 1;
  4122. samplesInPartition = blockSize / (1 << partitionOrder);
  4123. }
  4124. return DRFLAC_TRUE;
  4125. }
  4126. static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
  4127. {
  4128. drflac_uint32 i;
  4129. /* Only a single sample needs to be decoded here. */
  4130. drflac_int32 sample;
  4131. if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
  4132. return DRFLAC_FALSE;
  4133. }
  4134. /*
  4135. We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
  4136. we'll want to look at a more efficient way.
  4137. */
  4138. for (i = 0; i < blockSize; ++i) {
  4139. pDecodedSamples[i] = sample;
  4140. }
  4141. return DRFLAC_TRUE;
  4142. }
  4143. static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
  4144. {
  4145. drflac_uint32 i;
  4146. for (i = 0; i < blockSize; ++i) {
  4147. drflac_int32 sample;
  4148. if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
  4149. return DRFLAC_FALSE;
  4150. }
  4151. pDecodedSamples[i] = sample;
  4152. }
  4153. return DRFLAC_TRUE;
  4154. }
  4155. static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
  4156. {
  4157. drflac_uint32 i;
  4158. static drflac_int32 lpcCoefficientsTable[5][4] = {
  4159. {0, 0, 0, 0},
  4160. {1, 0, 0, 0},
  4161. {2, -1, 0, 0},
  4162. {3, -3, 1, 0},
  4163. {4, -6, 4, -1}
  4164. };
  4165. /* Warm up samples and coefficients. */
  4166. for (i = 0; i < lpcOrder; ++i) {
  4167. drflac_int32 sample;
  4168. if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
  4169. return DRFLAC_FALSE;
  4170. }
  4171. pDecodedSamples[i] = sample;
  4172. }
  4173. if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
  4174. return DRFLAC_FALSE;
  4175. }
  4176. return DRFLAC_TRUE;
  4177. }
  4178. static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
  4179. {
  4180. drflac_uint8 i;
  4181. drflac_uint8 lpcPrecision;
  4182. drflac_int8 lpcShift;
  4183. drflac_int32 coefficients[32];
  4184. /* Warm up samples. */
  4185. for (i = 0; i < lpcOrder; ++i) {
  4186. drflac_int32 sample;
  4187. if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
  4188. return DRFLAC_FALSE;
  4189. }
  4190. pDecodedSamples[i] = sample;
  4191. }
  4192. if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
  4193. return DRFLAC_FALSE;
  4194. }
  4195. if (lpcPrecision == 15) {
  4196. return DRFLAC_FALSE; /* Invalid. */
  4197. }
  4198. lpcPrecision += 1;
  4199. if (!drflac__read_int8(bs, 5, &lpcShift)) {
  4200. return DRFLAC_FALSE;
  4201. }
  4202. /*
  4203. From the FLAC specification:
  4204. Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
  4205. Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
  4206. not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
  4207. */
  4208. if (lpcShift < 0) {
  4209. return DRFLAC_FALSE;
  4210. }
  4211. DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
  4212. for (i = 0; i < lpcOrder; ++i) {
  4213. if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
  4214. return DRFLAC_FALSE;
  4215. }
  4216. }
  4217. if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, coefficients, pDecodedSamples)) {
  4218. return DRFLAC_FALSE;
  4219. }
  4220. return DRFLAC_TRUE;
  4221. }
  4222. static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
  4223. {
  4224. const drflac_uint32 sampleRateTable[12] = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
  4225. const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1}; /* -1 = reserved. */
  4226. DRFLAC_ASSERT(bs != NULL);
  4227. DRFLAC_ASSERT(header != NULL);
  4228. /* Keep looping until we find a valid sync code. */
  4229. for (;;) {
  4230. drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
  4231. drflac_uint8 reserved = 0;
  4232. drflac_uint8 blockingStrategy = 0;
  4233. drflac_uint8 blockSize = 0;
  4234. drflac_uint8 sampleRate = 0;
  4235. drflac_uint8 channelAssignment = 0;
  4236. drflac_uint8 bitsPerSample = 0;
  4237. drflac_bool32 isVariableBlockSize;
  4238. if (!drflac__find_and_seek_to_next_sync_code(bs)) {
  4239. return DRFLAC_FALSE;
  4240. }
  4241. if (!drflac__read_uint8(bs, 1, &reserved)) {
  4242. return DRFLAC_FALSE;
  4243. }
  4244. if (reserved == 1) {
  4245. continue;
  4246. }
  4247. crc8 = drflac_crc8(crc8, reserved, 1);
  4248. if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
  4249. return DRFLAC_FALSE;
  4250. }
  4251. crc8 = drflac_crc8(crc8, blockingStrategy, 1);
  4252. if (!drflac__read_uint8(bs, 4, &blockSize)) {
  4253. return DRFLAC_FALSE;
  4254. }
  4255. if (blockSize == 0) {
  4256. continue;
  4257. }
  4258. crc8 = drflac_crc8(crc8, blockSize, 4);
  4259. if (!drflac__read_uint8(bs, 4, &sampleRate)) {
  4260. return DRFLAC_FALSE;
  4261. }
  4262. crc8 = drflac_crc8(crc8, sampleRate, 4);
  4263. if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
  4264. return DRFLAC_FALSE;
  4265. }
  4266. if (channelAssignment > 10) {
  4267. continue;
  4268. }
  4269. crc8 = drflac_crc8(crc8, channelAssignment, 4);
  4270. if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
  4271. return DRFLAC_FALSE;
  4272. }
  4273. if (bitsPerSample == 3 || bitsPerSample == 7) {
  4274. continue;
  4275. }
  4276. crc8 = drflac_crc8(crc8, bitsPerSample, 3);
  4277. if (!drflac__read_uint8(bs, 1, &reserved)) {
  4278. return DRFLAC_FALSE;
  4279. }
  4280. if (reserved == 1) {
  4281. continue;
  4282. }
  4283. crc8 = drflac_crc8(crc8, reserved, 1);
  4284. isVariableBlockSize = blockingStrategy == 1;
  4285. if (isVariableBlockSize) {
  4286. drflac_uint64 pcmFrameNumber;
  4287. drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
  4288. if (result != DRFLAC_SUCCESS) {
  4289. if (result == DRFLAC_AT_END) {
  4290. return DRFLAC_FALSE;
  4291. } else {
  4292. continue;
  4293. }
  4294. }
  4295. header->flacFrameNumber = 0;
  4296. header->pcmFrameNumber = pcmFrameNumber;
  4297. } else {
  4298. drflac_uint64 flacFrameNumber = 0;
  4299. drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
  4300. if (result != DRFLAC_SUCCESS) {
  4301. if (result == DRFLAC_AT_END) {
  4302. return DRFLAC_FALSE;
  4303. } else {
  4304. continue;
  4305. }
  4306. }
  4307. header->flacFrameNumber = (drflac_uint32)flacFrameNumber; /* <-- Safe cast. */
  4308. header->pcmFrameNumber = 0;
  4309. }
  4310. DRFLAC_ASSERT(blockSize > 0);
  4311. if (blockSize == 1) {
  4312. header->blockSizeInPCMFrames = 192;
  4313. } else if (blockSize <= 5) {
  4314. DRFLAC_ASSERT(blockSize >= 2);
  4315. header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
  4316. } else if (blockSize == 6) {
  4317. if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
  4318. return DRFLAC_FALSE;
  4319. }
  4320. crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
  4321. header->blockSizeInPCMFrames += 1;
  4322. } else if (blockSize == 7) {
  4323. if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
  4324. return DRFLAC_FALSE;
  4325. }
  4326. crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
  4327. header->blockSizeInPCMFrames += 1;
  4328. } else {
  4329. DRFLAC_ASSERT(blockSize >= 8);
  4330. header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
  4331. }
  4332. if (sampleRate <= 11) {
  4333. header->sampleRate = sampleRateTable[sampleRate];
  4334. } else if (sampleRate == 12) {
  4335. if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
  4336. return DRFLAC_FALSE;
  4337. }
  4338. crc8 = drflac_crc8(crc8, header->sampleRate, 8);
  4339. header->sampleRate *= 1000;
  4340. } else if (sampleRate == 13) {
  4341. if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
  4342. return DRFLAC_FALSE;
  4343. }
  4344. crc8 = drflac_crc8(crc8, header->sampleRate, 16);
  4345. } else if (sampleRate == 14) {
  4346. if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
  4347. return DRFLAC_FALSE;
  4348. }
  4349. crc8 = drflac_crc8(crc8, header->sampleRate, 16);
  4350. header->sampleRate *= 10;
  4351. } else {
  4352. continue; /* Invalid. Assume an invalid block. */
  4353. }
  4354. header->channelAssignment = channelAssignment;
  4355. header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
  4356. if (header->bitsPerSample == 0) {
  4357. header->bitsPerSample = streaminfoBitsPerSample;
  4358. }
  4359. if (!drflac__read_uint8(bs, 8, &header->crc8)) {
  4360. return DRFLAC_FALSE;
  4361. }
  4362. #ifndef DR_FLAC_NO_CRC
  4363. if (header->crc8 != crc8) {
  4364. continue; /* CRC mismatch. Loop back to the top and find the next sync code. */
  4365. }
  4366. #endif
  4367. return DRFLAC_TRUE;
  4368. }
  4369. }
  4370. static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
  4371. {
  4372. drflac_uint8 header;
  4373. int type;
  4374. if (!drflac__read_uint8(bs, 8, &header)) {
  4375. return DRFLAC_FALSE;
  4376. }
  4377. /* First bit should always be 0. */
  4378. if ((header & 0x80) != 0) {
  4379. return DRFLAC_FALSE;
  4380. }
  4381. type = (header & 0x7E) >> 1;
  4382. if (type == 0) {
  4383. pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
  4384. } else if (type == 1) {
  4385. pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
  4386. } else {
  4387. if ((type & 0x20) != 0) {
  4388. pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
  4389. pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
  4390. } else if ((type & 0x08) != 0) {
  4391. pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
  4392. pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
  4393. if (pSubframe->lpcOrder > 4) {
  4394. pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
  4395. pSubframe->lpcOrder = 0;
  4396. }
  4397. } else {
  4398. pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
  4399. }
  4400. }
  4401. if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
  4402. return DRFLAC_FALSE;
  4403. }
  4404. /* Wasted bits per sample. */
  4405. pSubframe->wastedBitsPerSample = 0;
  4406. if ((header & 0x01) == 1) {
  4407. unsigned int wastedBitsPerSample;
  4408. if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
  4409. return DRFLAC_FALSE;
  4410. }
  4411. pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
  4412. }
  4413. return DRFLAC_TRUE;
  4414. }
  4415. static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
  4416. {
  4417. drflac_subframe* pSubframe;
  4418. drflac_uint32 subframeBitsPerSample;
  4419. DRFLAC_ASSERT(bs != NULL);
  4420. DRFLAC_ASSERT(frame != NULL);
  4421. pSubframe = frame->subframes + subframeIndex;
  4422. if (!drflac__read_subframe_header(bs, pSubframe)) {
  4423. return DRFLAC_FALSE;
  4424. }
  4425. /* Side channels require an extra bit per sample. Took a while to figure that one out... */
  4426. subframeBitsPerSample = frame->header.bitsPerSample;
  4427. if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
  4428. subframeBitsPerSample += 1;
  4429. } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
  4430. subframeBitsPerSample += 1;
  4431. }
  4432. /* Need to handle wasted bits per sample. */
  4433. if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
  4434. return DRFLAC_FALSE;
  4435. }
  4436. subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
  4437. pSubframe->pSamplesS32 = pDecodedSamplesOut;
  4438. switch (pSubframe->subframeType)
  4439. {
  4440. case DRFLAC_SUBFRAME_CONSTANT:
  4441. {
  4442. drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
  4443. } break;
  4444. case DRFLAC_SUBFRAME_VERBATIM:
  4445. {
  4446. drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
  4447. } break;
  4448. case DRFLAC_SUBFRAME_FIXED:
  4449. {
  4450. drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
  4451. } break;
  4452. case DRFLAC_SUBFRAME_LPC:
  4453. {
  4454. drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
  4455. } break;
  4456. default: return DRFLAC_FALSE;
  4457. }
  4458. return DRFLAC_TRUE;
  4459. }
  4460. static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
  4461. {
  4462. drflac_subframe* pSubframe;
  4463. drflac_uint32 subframeBitsPerSample;
  4464. DRFLAC_ASSERT(bs != NULL);
  4465. DRFLAC_ASSERT(frame != NULL);
  4466. pSubframe = frame->subframes + subframeIndex;
  4467. if (!drflac__read_subframe_header(bs, pSubframe)) {
  4468. return DRFLAC_FALSE;
  4469. }
  4470. /* Side channels require an extra bit per sample. Took a while to figure that one out... */
  4471. subframeBitsPerSample = frame->header.bitsPerSample;
  4472. if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
  4473. subframeBitsPerSample += 1;
  4474. } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
  4475. subframeBitsPerSample += 1;
  4476. }
  4477. /* Need to handle wasted bits per sample. */
  4478. if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
  4479. return DRFLAC_FALSE;
  4480. }
  4481. subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
  4482. pSubframe->pSamplesS32 = NULL;
  4483. switch (pSubframe->subframeType)
  4484. {
  4485. case DRFLAC_SUBFRAME_CONSTANT:
  4486. {
  4487. if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
  4488. return DRFLAC_FALSE;
  4489. }
  4490. } break;
  4491. case DRFLAC_SUBFRAME_VERBATIM:
  4492. {
  4493. unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
  4494. if (!drflac__seek_bits(bs, bitsToSeek)) {
  4495. return DRFLAC_FALSE;
  4496. }
  4497. } break;
  4498. case DRFLAC_SUBFRAME_FIXED:
  4499. {
  4500. unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
  4501. if (!drflac__seek_bits(bs, bitsToSeek)) {
  4502. return DRFLAC_FALSE;
  4503. }
  4504. if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
  4505. return DRFLAC_FALSE;
  4506. }
  4507. } break;
  4508. case DRFLAC_SUBFRAME_LPC:
  4509. {
  4510. drflac_uint8 lpcPrecision;
  4511. unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
  4512. if (!drflac__seek_bits(bs, bitsToSeek)) {
  4513. return DRFLAC_FALSE;
  4514. }
  4515. if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
  4516. return DRFLAC_FALSE;
  4517. }
  4518. if (lpcPrecision == 15) {
  4519. return DRFLAC_FALSE; /* Invalid. */
  4520. }
  4521. lpcPrecision += 1;
  4522. bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5; /* +5 for shift. */
  4523. if (!drflac__seek_bits(bs, bitsToSeek)) {
  4524. return DRFLAC_FALSE;
  4525. }
  4526. if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
  4527. return DRFLAC_FALSE;
  4528. }
  4529. } break;
  4530. default: return DRFLAC_FALSE;
  4531. }
  4532. return DRFLAC_TRUE;
  4533. }
  4534. static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
  4535. {
  4536. drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
  4537. DRFLAC_ASSERT(channelAssignment <= 10);
  4538. return lookup[channelAssignment];
  4539. }
  4540. static drflac_result drflac__decode_flac_frame(drflac* pFlac)
  4541. {
  4542. int channelCount;
  4543. int i;
  4544. drflac_uint8 paddingSizeInBits;
  4545. drflac_uint16 desiredCRC16;
  4546. #ifndef DR_FLAC_NO_CRC
  4547. drflac_uint16 actualCRC16;
  4548. #endif
  4549. /* This function should be called while the stream is sitting on the first byte after the frame header. */
  4550. DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
  4551. /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
  4552. if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
  4553. return DRFLAC_ERROR;
  4554. }
  4555. /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
  4556. channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
  4557. if (channelCount != (int)pFlac->channels) {
  4558. return DRFLAC_ERROR;
  4559. }
  4560. for (i = 0; i < channelCount; ++i) {
  4561. if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
  4562. return DRFLAC_ERROR;
  4563. }
  4564. }
  4565. paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
  4566. if (paddingSizeInBits > 0) {
  4567. drflac_uint8 padding = 0;
  4568. if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
  4569. return DRFLAC_AT_END;
  4570. }
  4571. }
  4572. #ifndef DR_FLAC_NO_CRC
  4573. actualCRC16 = drflac__flush_crc16(&pFlac->bs);
  4574. #endif
  4575. if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
  4576. return DRFLAC_AT_END;
  4577. }
  4578. #ifndef DR_FLAC_NO_CRC
  4579. if (actualCRC16 != desiredCRC16) {
  4580. return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
  4581. }
  4582. #endif
  4583. pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
  4584. return DRFLAC_SUCCESS;
  4585. }
  4586. static drflac_result drflac__seek_flac_frame(drflac* pFlac)
  4587. {
  4588. int channelCount;
  4589. int i;
  4590. drflac_uint16 desiredCRC16;
  4591. #ifndef DR_FLAC_NO_CRC
  4592. drflac_uint16 actualCRC16;
  4593. #endif
  4594. channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
  4595. for (i = 0; i < channelCount; ++i) {
  4596. if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
  4597. return DRFLAC_ERROR;
  4598. }
  4599. }
  4600. /* Padding. */
  4601. if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
  4602. return DRFLAC_ERROR;
  4603. }
  4604. /* CRC. */
  4605. #ifndef DR_FLAC_NO_CRC
  4606. actualCRC16 = drflac__flush_crc16(&pFlac->bs);
  4607. #endif
  4608. if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
  4609. return DRFLAC_AT_END;
  4610. }
  4611. #ifndef DR_FLAC_NO_CRC
  4612. if (actualCRC16 != desiredCRC16) {
  4613. return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
  4614. }
  4615. #endif
  4616. return DRFLAC_SUCCESS;
  4617. }
  4618. static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
  4619. {
  4620. DRFLAC_ASSERT(pFlac != NULL);
  4621. for (;;) {
  4622. drflac_result result;
  4623. if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  4624. return DRFLAC_FALSE;
  4625. }
  4626. result = drflac__decode_flac_frame(pFlac);
  4627. if (result != DRFLAC_SUCCESS) {
  4628. if (result == DRFLAC_CRC_MISMATCH) {
  4629. continue; /* CRC mismatch. Skip to the next frame. */
  4630. } else {
  4631. return DRFLAC_FALSE;
  4632. }
  4633. }
  4634. return DRFLAC_TRUE;
  4635. }
  4636. }
  4637. static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
  4638. {
  4639. drflac_uint64 firstPCMFrame;
  4640. drflac_uint64 lastPCMFrame;
  4641. DRFLAC_ASSERT(pFlac != NULL);
  4642. firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
  4643. if (firstPCMFrame == 0) {
  4644. firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
  4645. }
  4646. lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
  4647. if (lastPCMFrame > 0) {
  4648. lastPCMFrame -= 1; /* Needs to be zero based. */
  4649. }
  4650. if (pFirstPCMFrame) {
  4651. *pFirstPCMFrame = firstPCMFrame;
  4652. }
  4653. if (pLastPCMFrame) {
  4654. *pLastPCMFrame = lastPCMFrame;
  4655. }
  4656. }
  4657. static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
  4658. {
  4659. drflac_bool32 result;
  4660. DRFLAC_ASSERT(pFlac != NULL);
  4661. result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
  4662. DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
  4663. pFlac->currentPCMFrame = 0;
  4664. return result;
  4665. }
  4666. static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
  4667. {
  4668. /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
  4669. DRFLAC_ASSERT(pFlac != NULL);
  4670. return drflac__seek_flac_frame(pFlac);
  4671. }
  4672. static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
  4673. {
  4674. drflac_uint64 pcmFramesRead = 0;
  4675. while (pcmFramesToSeek > 0) {
  4676. if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
  4677. if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
  4678. break; /* Couldn't read the next frame, so just break from the loop and return. */
  4679. }
  4680. } else {
  4681. if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
  4682. pcmFramesRead += pcmFramesToSeek;
  4683. pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek; /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
  4684. pcmFramesToSeek = 0;
  4685. } else {
  4686. pcmFramesRead += pFlac->currentFLACFrame.pcmFramesRemaining;
  4687. pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
  4688. pFlac->currentFLACFrame.pcmFramesRemaining = 0;
  4689. }
  4690. }
  4691. }
  4692. pFlac->currentPCMFrame += pcmFramesRead;
  4693. return pcmFramesRead;
  4694. }
  4695. static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
  4696. {
  4697. drflac_bool32 isMidFrame = DRFLAC_FALSE;
  4698. drflac_uint64 runningPCMFrameCount;
  4699. DRFLAC_ASSERT(pFlac != NULL);
  4700. /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
  4701. if (pcmFrameIndex >= pFlac->currentPCMFrame) {
  4702. /* Seeking forward. Need to seek from the current position. */
  4703. runningPCMFrameCount = pFlac->currentPCMFrame;
  4704. /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
  4705. if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
  4706. if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  4707. return DRFLAC_FALSE;
  4708. }
  4709. } else {
  4710. isMidFrame = DRFLAC_TRUE;
  4711. }
  4712. } else {
  4713. /* Seeking backwards. Need to seek from the start of the file. */
  4714. runningPCMFrameCount = 0;
  4715. /* Move back to the start. */
  4716. if (!drflac__seek_to_first_frame(pFlac)) {
  4717. return DRFLAC_FALSE;
  4718. }
  4719. /* Decode the first frame in preparation for sample-exact seeking below. */
  4720. if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  4721. return DRFLAC_FALSE;
  4722. }
  4723. }
  4724. /*
  4725. We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
  4726. header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
  4727. */
  4728. for (;;) {
  4729. drflac_uint64 pcmFrameCountInThisFLACFrame;
  4730. drflac_uint64 firstPCMFrameInFLACFrame = 0;
  4731. drflac_uint64 lastPCMFrameInFLACFrame = 0;
  4732. drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
  4733. pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
  4734. if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
  4735. /*
  4736. The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
  4737. it never existed and keep iterating.
  4738. */
  4739. drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
  4740. if (!isMidFrame) {
  4741. drflac_result result = drflac__decode_flac_frame(pFlac);
  4742. if (result == DRFLAC_SUCCESS) {
  4743. /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
  4744. return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
  4745. } else {
  4746. if (result == DRFLAC_CRC_MISMATCH) {
  4747. goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
  4748. } else {
  4749. return DRFLAC_FALSE;
  4750. }
  4751. }
  4752. } else {
  4753. /* We started seeking mid-frame which means we need to skip the frame decoding part. */
  4754. return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
  4755. }
  4756. } else {
  4757. /*
  4758. It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
  4759. frame never existed and leave the running sample count untouched.
  4760. */
  4761. if (!isMidFrame) {
  4762. drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
  4763. if (result == DRFLAC_SUCCESS) {
  4764. runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
  4765. } else {
  4766. if (result == DRFLAC_CRC_MISMATCH) {
  4767. goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
  4768. } else {
  4769. return DRFLAC_FALSE;
  4770. }
  4771. }
  4772. } else {
  4773. /*
  4774. We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
  4775. drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
  4776. */
  4777. runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
  4778. pFlac->currentFLACFrame.pcmFramesRemaining = 0;
  4779. isMidFrame = DRFLAC_FALSE;
  4780. }
  4781. /* If we are seeking to the end of the file and we've just hit it, we're done. */
  4782. if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
  4783. return DRFLAC_TRUE;
  4784. }
  4785. }
  4786. next_iteration:
  4787. /* Grab the next frame in preparation for the next iteration. */
  4788. if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  4789. return DRFLAC_FALSE;
  4790. }
  4791. }
  4792. }
  4793. #if !defined(DR_FLAC_NO_CRC)
  4794. /*
  4795. We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
  4796. uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
  4797. location.
  4798. */
  4799. #define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
  4800. static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
  4801. {
  4802. DRFLAC_ASSERT(pFlac != NULL);
  4803. DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
  4804. DRFLAC_ASSERT(targetByte >= rangeLo);
  4805. DRFLAC_ASSERT(targetByte <= rangeHi);
  4806. *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
  4807. for (;;) {
  4808. /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
  4809. drflac_uint64 lastTargetByte = targetByte;
  4810. /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
  4811. if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
  4812. /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
  4813. if (targetByte == 0) {
  4814. drflac__seek_to_first_frame(pFlac); /* Try to recover. */
  4815. return DRFLAC_FALSE;
  4816. }
  4817. /* Halve the byte location and continue. */
  4818. targetByte = rangeLo + ((rangeHi - rangeLo)/2);
  4819. rangeHi = targetByte;
  4820. } else {
  4821. /* Getting here should mean that we have seeked to an appropriate byte. */
  4822. /* Clear the details of the FLAC frame so we don't misreport data. */
  4823. DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
  4824. /*
  4825. Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
  4826. CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
  4827. so it needs to stay this way for now.
  4828. */
  4829. #if 1
  4830. if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
  4831. /* Halve the byte location and continue. */
  4832. targetByte = rangeLo + ((rangeHi - rangeLo)/2);
  4833. rangeHi = targetByte;
  4834. } else {
  4835. break;
  4836. }
  4837. #else
  4838. if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  4839. /* Halve the byte location and continue. */
  4840. targetByte = rangeLo + ((rangeHi - rangeLo)/2);
  4841. rangeHi = targetByte;
  4842. } else {
  4843. break;
  4844. }
  4845. #endif
  4846. }
  4847. /* We already tried this byte and there are no more to try, break out. */
  4848. if(targetByte == lastTargetByte) {
  4849. return DRFLAC_FALSE;
  4850. }
  4851. }
  4852. /* The current PCM frame needs to be updated based on the frame we just seeked to. */
  4853. drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
  4854. DRFLAC_ASSERT(targetByte <= rangeHi);
  4855. *pLastSuccessfulSeekOffset = targetByte;
  4856. return DRFLAC_TRUE;
  4857. }
  4858. static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
  4859. {
  4860. /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
  4861. #if 0
  4862. if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
  4863. /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
  4864. if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
  4865. return DRFLAC_FALSE;
  4866. }
  4867. }
  4868. #endif
  4869. return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
  4870. }
  4871. static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
  4872. {
  4873. /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
  4874. drflac_uint64 targetByte;
  4875. drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
  4876. drflac_uint64 pcmRangeHi = 0;
  4877. drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
  4878. drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
  4879. drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
  4880. targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
  4881. if (targetByte > byteRangeHi) {
  4882. targetByte = byteRangeHi;
  4883. }
  4884. for (;;) {
  4885. if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
  4886. /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
  4887. drflac_uint64 newPCMRangeLo;
  4888. drflac_uint64 newPCMRangeHi;
  4889. drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
  4890. /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
  4891. if (pcmRangeLo == newPCMRangeLo) {
  4892. if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
  4893. break; /* Failed to seek to closest frame. */
  4894. }
  4895. if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
  4896. return DRFLAC_TRUE;
  4897. } else {
  4898. break; /* Failed to seek forward. */
  4899. }
  4900. }
  4901. pcmRangeLo = newPCMRangeLo;
  4902. pcmRangeHi = newPCMRangeHi;
  4903. if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
  4904. /* The target PCM frame is in this FLAC frame. */
  4905. if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
  4906. return DRFLAC_TRUE;
  4907. } else {
  4908. break; /* Failed to seek to FLAC frame. */
  4909. }
  4910. } else {
  4911. const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
  4912. if (pcmRangeLo > pcmFrameIndex) {
  4913. /* We seeked too far forward. We need to move our target byte backward and try again. */
  4914. byteRangeHi = lastSuccessfulSeekOffset;
  4915. if (byteRangeLo > byteRangeHi) {
  4916. byteRangeLo = byteRangeHi;
  4917. }
  4918. targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
  4919. if (targetByte < byteRangeLo) {
  4920. targetByte = byteRangeLo;
  4921. }
  4922. } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
  4923. /* We didn't seek far enough. We need to move our target byte forward and try again. */
  4924. /* If we're close enough we can just seek forward. */
  4925. if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
  4926. if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
  4927. return DRFLAC_TRUE;
  4928. } else {
  4929. break; /* Failed to seek to FLAC frame. */
  4930. }
  4931. } else {
  4932. byteRangeLo = lastSuccessfulSeekOffset;
  4933. if (byteRangeHi < byteRangeLo) {
  4934. byteRangeHi = byteRangeLo;
  4935. }
  4936. targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
  4937. if (targetByte > byteRangeHi) {
  4938. targetByte = byteRangeHi;
  4939. }
  4940. if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
  4941. closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
  4942. }
  4943. }
  4944. }
  4945. }
  4946. } else {
  4947. /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
  4948. break;
  4949. }
  4950. }
  4951. drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
  4952. return DRFLAC_FALSE;
  4953. }
  4954. static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
  4955. {
  4956. drflac_uint64 byteRangeLo;
  4957. drflac_uint64 byteRangeHi;
  4958. drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
  4959. /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
  4960. if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
  4961. return DRFLAC_FALSE;
  4962. }
  4963. /* If we're close enough to the start, just move to the start and seek forward. */
  4964. if (pcmFrameIndex < seekForwardThreshold) {
  4965. return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
  4966. }
  4967. /*
  4968. Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
  4969. the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
  4970. */
  4971. byteRangeLo = pFlac->firstFLACFramePosInBytes;
  4972. byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
  4973. return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
  4974. }
  4975. #endif /* !DR_FLAC_NO_CRC */
  4976. static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
  4977. {
  4978. drflac_uint32 iClosestSeekpoint = 0;
  4979. drflac_bool32 isMidFrame = DRFLAC_FALSE;
  4980. drflac_uint64 runningPCMFrameCount;
  4981. drflac_uint32 iSeekpoint;
  4982. DRFLAC_ASSERT(pFlac != NULL);
  4983. if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
  4984. return DRFLAC_FALSE;
  4985. }
  4986. for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
  4987. if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
  4988. break;
  4989. }
  4990. iClosestSeekpoint = iSeekpoint;
  4991. }
  4992. /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
  4993. if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
  4994. return DRFLAC_FALSE;
  4995. }
  4996. if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
  4997. return DRFLAC_FALSE;
  4998. }
  4999. #if !defined(DR_FLAC_NO_CRC)
  5000. /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
  5001. if (pFlac->totalPCMFrameCount > 0) {
  5002. drflac_uint64 byteRangeLo;
  5003. drflac_uint64 byteRangeHi;
  5004. byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
  5005. byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
  5006. /*
  5007. If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
  5008. value for byteRangeHi which will clamp it appropriately.
  5009. Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
  5010. have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
  5011. */
  5012. if (iClosestSeekpoint < pFlac->seekpointCount-1) {
  5013. drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
  5014. /* Basic validation on the seekpoints to ensure they're usable. */
  5015. if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
  5016. return DRFLAC_FALSE; /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
  5017. }
  5018. if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
  5019. byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
  5020. }
  5021. }
  5022. if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
  5023. if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  5024. drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
  5025. if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
  5026. return DRFLAC_TRUE;
  5027. }
  5028. }
  5029. }
  5030. }
  5031. #endif /* !DR_FLAC_NO_CRC */
  5032. /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
  5033. /*
  5034. If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
  5035. from the seekpoint's first sample.
  5036. */
  5037. if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
  5038. /* Optimized case. Just seek forward from where we are. */
  5039. runningPCMFrameCount = pFlac->currentPCMFrame;
  5040. /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
  5041. if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
  5042. if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  5043. return DRFLAC_FALSE;
  5044. }
  5045. } else {
  5046. isMidFrame = DRFLAC_TRUE;
  5047. }
  5048. } else {
  5049. /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
  5050. runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
  5051. if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
  5052. return DRFLAC_FALSE;
  5053. }
  5054. /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
  5055. if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  5056. return DRFLAC_FALSE;
  5057. }
  5058. }
  5059. for (;;) {
  5060. drflac_uint64 pcmFrameCountInThisFLACFrame;
  5061. drflac_uint64 firstPCMFrameInFLACFrame = 0;
  5062. drflac_uint64 lastPCMFrameInFLACFrame = 0;
  5063. drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
  5064. pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
  5065. if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
  5066. /*
  5067. The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
  5068. it never existed and keep iterating.
  5069. */
  5070. drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
  5071. if (!isMidFrame) {
  5072. drflac_result result = drflac__decode_flac_frame(pFlac);
  5073. if (result == DRFLAC_SUCCESS) {
  5074. /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
  5075. return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
  5076. } else {
  5077. if (result == DRFLAC_CRC_MISMATCH) {
  5078. goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
  5079. } else {
  5080. return DRFLAC_FALSE;
  5081. }
  5082. }
  5083. } else {
  5084. /* We started seeking mid-frame which means we need to skip the frame decoding part. */
  5085. return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
  5086. }
  5087. } else {
  5088. /*
  5089. It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
  5090. frame never existed and leave the running sample count untouched.
  5091. */
  5092. if (!isMidFrame) {
  5093. drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
  5094. if (result == DRFLAC_SUCCESS) {
  5095. runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
  5096. } else {
  5097. if (result == DRFLAC_CRC_MISMATCH) {
  5098. goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
  5099. } else {
  5100. return DRFLAC_FALSE;
  5101. }
  5102. }
  5103. } else {
  5104. /*
  5105. We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
  5106. drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
  5107. */
  5108. runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
  5109. pFlac->currentFLACFrame.pcmFramesRemaining = 0;
  5110. isMidFrame = DRFLAC_FALSE;
  5111. }
  5112. /* If we are seeking to the end of the file and we've just hit it, we're done. */
  5113. if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
  5114. return DRFLAC_TRUE;
  5115. }
  5116. }
  5117. next_iteration:
  5118. /* Grab the next frame in preparation for the next iteration. */
  5119. if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  5120. return DRFLAC_FALSE;
  5121. }
  5122. }
  5123. }
  5124. #ifndef DR_FLAC_NO_OGG
  5125. typedef struct
  5126. {
  5127. drflac_uint8 capturePattern[4]; /* Should be "OggS" */
  5128. drflac_uint8 structureVersion; /* Always 0. */
  5129. drflac_uint8 headerType;
  5130. drflac_uint64 granulePosition;
  5131. drflac_uint32 serialNumber;
  5132. drflac_uint32 sequenceNumber;
  5133. drflac_uint32 checksum;
  5134. drflac_uint8 segmentCount;
  5135. drflac_uint8 segmentTable[255];
  5136. } drflac_ogg_page_header;
  5137. #endif
  5138. typedef struct
  5139. {
  5140. drflac_read_proc onRead;
  5141. drflac_seek_proc onSeek;
  5142. drflac_meta_proc onMeta;
  5143. drflac_container container;
  5144. void* pUserData;
  5145. void* pUserDataMD;
  5146. drflac_uint32 sampleRate;
  5147. drflac_uint8 channels;
  5148. drflac_uint8 bitsPerSample;
  5149. drflac_uint64 totalPCMFrameCount;
  5150. drflac_uint16 maxBlockSizeInPCMFrames;
  5151. drflac_uint64 runningFilePos;
  5152. drflac_bool32 hasStreamInfoBlock;
  5153. drflac_bool32 hasMetadataBlocks;
  5154. drflac_bs bs; /* <-- A bit streamer is required for loading data during initialization. */
  5155. drflac_frame_header firstFrameHeader; /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
  5156. #ifndef DR_FLAC_NO_OGG
  5157. drflac_uint32 oggSerial;
  5158. drflac_uint64 oggFirstBytePos;
  5159. drflac_ogg_page_header oggBosHeader;
  5160. #endif
  5161. } drflac_init_info;
  5162. static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
  5163. {
  5164. blockHeader = drflac__be2host_32(blockHeader);
  5165. *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
  5166. *blockType = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
  5167. *blockSize = (blockHeader & 0x00FFFFFFUL);
  5168. }
  5169. static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
  5170. {
  5171. drflac_uint32 blockHeader;
  5172. *blockSize = 0;
  5173. if (onRead(pUserData, &blockHeader, 4) != 4) {
  5174. return DRFLAC_FALSE;
  5175. }
  5176. drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
  5177. return DRFLAC_TRUE;
  5178. }
  5179. static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
  5180. {
  5181. drflac_uint32 blockSizes;
  5182. drflac_uint64 frameSizes = 0;
  5183. drflac_uint64 importantProps;
  5184. drflac_uint8 md5[16];
  5185. /* min/max block size. */
  5186. if (onRead(pUserData, &blockSizes, 4) != 4) {
  5187. return DRFLAC_FALSE;
  5188. }
  5189. /* min/max frame size. */
  5190. if (onRead(pUserData, &frameSizes, 6) != 6) {
  5191. return DRFLAC_FALSE;
  5192. }
  5193. /* Sample rate, channels, bits per sample and total sample count. */
  5194. if (onRead(pUserData, &importantProps, 8) != 8) {
  5195. return DRFLAC_FALSE;
  5196. }
  5197. /* MD5 */
  5198. if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
  5199. return DRFLAC_FALSE;
  5200. }
  5201. blockSizes = drflac__be2host_32(blockSizes);
  5202. frameSizes = drflac__be2host_64(frameSizes);
  5203. importantProps = drflac__be2host_64(importantProps);
  5204. pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
  5205. pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
  5206. pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
  5207. pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 0)) >> 16);
  5208. pStreamInfo->sampleRate = (drflac_uint32)((importantProps & (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
  5209. pStreamInfo->channels = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
  5210. pStreamInfo->bitsPerSample = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
  5211. pStreamInfo->totalPCMFrameCount = ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
  5212. DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
  5213. return DRFLAC_TRUE;
  5214. }
  5215. static void* drflac__malloc_default(size_t sz, void* pUserData)
  5216. {
  5217. (void)pUserData;
  5218. return DRFLAC_MALLOC(sz);
  5219. }
  5220. static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
  5221. {
  5222. (void)pUserData;
  5223. return DRFLAC_REALLOC(p, sz);
  5224. }
  5225. static void drflac__free_default(void* p, void* pUserData)
  5226. {
  5227. (void)pUserData;
  5228. DRFLAC_FREE(p);
  5229. }
  5230. static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
  5231. {
  5232. if (pAllocationCallbacks == NULL) {
  5233. return NULL;
  5234. }
  5235. if (pAllocationCallbacks->onMalloc != NULL) {
  5236. return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
  5237. }
  5238. /* Try using realloc(). */
  5239. if (pAllocationCallbacks->onRealloc != NULL) {
  5240. return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
  5241. }
  5242. return NULL;
  5243. }
  5244. static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
  5245. {
  5246. if (pAllocationCallbacks == NULL) {
  5247. return NULL;
  5248. }
  5249. if (pAllocationCallbacks->onRealloc != NULL) {
  5250. return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
  5251. }
  5252. /* Try emulating realloc() in terms of malloc()/free(). */
  5253. if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
  5254. void* p2;
  5255. p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
  5256. if (p2 == NULL) {
  5257. return NULL;
  5258. }
  5259. if (p != NULL) {
  5260. DRFLAC_COPY_MEMORY(p2, p, szOld);
  5261. pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
  5262. }
  5263. return p2;
  5264. }
  5265. return NULL;
  5266. }
  5267. static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
  5268. {
  5269. if (p == NULL || pAllocationCallbacks == NULL) {
  5270. return;
  5271. }
  5272. if (pAllocationCallbacks->onFree != NULL) {
  5273. pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
  5274. }
  5275. }
  5276. static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeektableSize, drflac_allocation_callbacks* pAllocationCallbacks)
  5277. {
  5278. /*
  5279. We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
  5280. we'll be sitting on byte 42.
  5281. */
  5282. drflac_uint64 runningFilePos = 42;
  5283. drflac_uint64 seektablePos = 0;
  5284. drflac_uint32 seektableSize = 0;
  5285. for (;;) {
  5286. drflac_metadata metadata;
  5287. drflac_uint8 isLastBlock = 0;
  5288. drflac_uint8 blockType;
  5289. drflac_uint32 blockSize;
  5290. if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
  5291. return DRFLAC_FALSE;
  5292. }
  5293. runningFilePos += 4;
  5294. metadata.type = blockType;
  5295. metadata.pRawData = NULL;
  5296. metadata.rawDataSize = 0;
  5297. switch (blockType)
  5298. {
  5299. case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
  5300. {
  5301. if (blockSize < 4) {
  5302. return DRFLAC_FALSE;
  5303. }
  5304. if (onMeta) {
  5305. void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
  5306. if (pRawData == NULL) {
  5307. return DRFLAC_FALSE;
  5308. }
  5309. if (onRead(pUserData, pRawData, blockSize) != blockSize) {
  5310. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5311. return DRFLAC_FALSE;
  5312. }
  5313. metadata.pRawData = pRawData;
  5314. metadata.rawDataSize = blockSize;
  5315. metadata.data.application.id = drflac__be2host_32(*(drflac_uint32*)pRawData);
  5316. metadata.data.application.pData = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
  5317. metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
  5318. onMeta(pUserDataMD, &metadata);
  5319. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5320. }
  5321. } break;
  5322. case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
  5323. {
  5324. seektablePos = runningFilePos;
  5325. seektableSize = blockSize;
  5326. if (onMeta) {
  5327. drflac_uint32 iSeekpoint;
  5328. void* pRawData;
  5329. pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
  5330. if (pRawData == NULL) {
  5331. return DRFLAC_FALSE;
  5332. }
  5333. if (onRead(pUserData, pRawData, blockSize) != blockSize) {
  5334. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5335. return DRFLAC_FALSE;
  5336. }
  5337. metadata.pRawData = pRawData;
  5338. metadata.rawDataSize = blockSize;
  5339. metadata.data.seektable.seekpointCount = blockSize/sizeof(drflac_seekpoint);
  5340. metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
  5341. /* Endian swap. */
  5342. for (iSeekpoint = 0; iSeekpoint < metadata.data.seektable.seekpointCount; ++iSeekpoint) {
  5343. drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
  5344. pSeekpoint->firstPCMFrame = drflac__be2host_64(pSeekpoint->firstPCMFrame);
  5345. pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
  5346. pSeekpoint->pcmFrameCount = drflac__be2host_16(pSeekpoint->pcmFrameCount);
  5347. }
  5348. onMeta(pUserDataMD, &metadata);
  5349. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5350. }
  5351. } break;
  5352. case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
  5353. {
  5354. if (blockSize < 8) {
  5355. return DRFLAC_FALSE;
  5356. }
  5357. if (onMeta) {
  5358. void* pRawData;
  5359. const char* pRunningData;
  5360. const char* pRunningDataEnd;
  5361. drflac_uint32 i;
  5362. pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
  5363. if (pRawData == NULL) {
  5364. return DRFLAC_FALSE;
  5365. }
  5366. if (onRead(pUserData, pRawData, blockSize) != blockSize) {
  5367. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5368. return DRFLAC_FALSE;
  5369. }
  5370. metadata.pRawData = pRawData;
  5371. metadata.rawDataSize = blockSize;
  5372. pRunningData = (const char*)pRawData;
  5373. pRunningDataEnd = (const char*)pRawData + blockSize;
  5374. metadata.data.vorbis_comment.vendorLength = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5375. /* Need space for the rest of the block */
  5376. if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
  5377. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5378. return DRFLAC_FALSE;
  5379. }
  5380. metadata.data.vorbis_comment.vendor = pRunningData; pRunningData += metadata.data.vorbis_comment.vendorLength;
  5381. metadata.data.vorbis_comment.commentCount = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5382. /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
  5383. if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
  5384. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5385. return DRFLAC_FALSE;
  5386. }
  5387. metadata.data.vorbis_comment.pComments = pRunningData;
  5388. /* Check that the comments section is valid before passing it to the callback */
  5389. for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
  5390. drflac_uint32 commentLength;
  5391. if (pRunningDataEnd - pRunningData < 4) {
  5392. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5393. return DRFLAC_FALSE;
  5394. }
  5395. commentLength = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5396. if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
  5397. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5398. return DRFLAC_FALSE;
  5399. }
  5400. pRunningData += commentLength;
  5401. }
  5402. onMeta(pUserDataMD, &metadata);
  5403. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5404. }
  5405. } break;
  5406. case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
  5407. {
  5408. if (blockSize < 396) {
  5409. return DRFLAC_FALSE;
  5410. }
  5411. if (onMeta) {
  5412. void* pRawData;
  5413. const char* pRunningData;
  5414. const char* pRunningDataEnd;
  5415. drflac_uint8 iTrack;
  5416. drflac_uint8 iIndex;
  5417. pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
  5418. if (pRawData == NULL) {
  5419. return DRFLAC_FALSE;
  5420. }
  5421. if (onRead(pUserData, pRawData, blockSize) != blockSize) {
  5422. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5423. return DRFLAC_FALSE;
  5424. }
  5425. metadata.pRawData = pRawData;
  5426. metadata.rawDataSize = blockSize;
  5427. pRunningData = (const char*)pRawData;
  5428. pRunningDataEnd = (const char*)pRawData + blockSize;
  5429. DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128); pRunningData += 128;
  5430. metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
  5431. metadata.data.cuesheet.isCD = (pRunningData[0] & 0x80) != 0; pRunningData += 259;
  5432. metadata.data.cuesheet.trackCount = pRunningData[0]; pRunningData += 1;
  5433. metadata.data.cuesheet.pTrackData = pRunningData;
  5434. /* Check that the cuesheet tracks are valid before passing it to the callback */
  5435. for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
  5436. drflac_uint8 indexCount;
  5437. drflac_uint32 indexPointSize;
  5438. if (pRunningDataEnd - pRunningData < 36) {
  5439. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5440. return DRFLAC_FALSE;
  5441. }
  5442. /* Skip to the index point count */
  5443. pRunningData += 35;
  5444. indexCount = pRunningData[0]; pRunningData += 1;
  5445. indexPointSize = indexCount * sizeof(drflac_cuesheet_track_index);
  5446. if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
  5447. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5448. return DRFLAC_FALSE;
  5449. }
  5450. /* Endian swap. */
  5451. for (iIndex = 0; iIndex < indexCount; ++iIndex) {
  5452. drflac_cuesheet_track_index* pTrack = (drflac_cuesheet_track_index*)pRunningData;
  5453. pRunningData += sizeof(drflac_cuesheet_track_index);
  5454. pTrack->offset = drflac__be2host_64(pTrack->offset);
  5455. }
  5456. }
  5457. onMeta(pUserDataMD, &metadata);
  5458. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5459. }
  5460. } break;
  5461. case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
  5462. {
  5463. if (blockSize < 32) {
  5464. return DRFLAC_FALSE;
  5465. }
  5466. if (onMeta) {
  5467. void* pRawData;
  5468. const char* pRunningData;
  5469. const char* pRunningDataEnd;
  5470. pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
  5471. if (pRawData == NULL) {
  5472. return DRFLAC_FALSE;
  5473. }
  5474. if (onRead(pUserData, pRawData, blockSize) != blockSize) {
  5475. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5476. return DRFLAC_FALSE;
  5477. }
  5478. metadata.pRawData = pRawData;
  5479. metadata.rawDataSize = blockSize;
  5480. pRunningData = (const char*)pRawData;
  5481. pRunningDataEnd = (const char*)pRawData + blockSize;
  5482. metadata.data.picture.type = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5483. metadata.data.picture.mimeLength = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5484. /* Need space for the rest of the block */
  5485. if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
  5486. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5487. return DRFLAC_FALSE;
  5488. }
  5489. metadata.data.picture.mime = pRunningData; pRunningData += metadata.data.picture.mimeLength;
  5490. metadata.data.picture.descriptionLength = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5491. /* Need space for the rest of the block */
  5492. if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
  5493. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5494. return DRFLAC_FALSE;
  5495. }
  5496. metadata.data.picture.description = pRunningData; pRunningData += metadata.data.picture.descriptionLength;
  5497. metadata.data.picture.width = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5498. metadata.data.picture.height = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5499. metadata.data.picture.colorDepth = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5500. metadata.data.picture.indexColorCount = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5501. metadata.data.picture.pictureDataSize = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  5502. metadata.data.picture.pPictureData = (const drflac_uint8*)pRunningData;
  5503. /* Need space for the picture after the block */
  5504. if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
  5505. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5506. return DRFLAC_FALSE;
  5507. }
  5508. onMeta(pUserDataMD, &metadata);
  5509. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5510. }
  5511. } break;
  5512. case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
  5513. {
  5514. if (onMeta) {
  5515. metadata.data.padding.unused = 0;
  5516. /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
  5517. if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
  5518. isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
  5519. } else {
  5520. onMeta(pUserDataMD, &metadata);
  5521. }
  5522. }
  5523. } break;
  5524. case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
  5525. {
  5526. /* Invalid chunk. Just skip over this one. */
  5527. if (onMeta) {
  5528. if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
  5529. isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
  5530. }
  5531. }
  5532. } break;
  5533. default:
  5534. {
  5535. /*
  5536. It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
  5537. can at the very least report the chunk to the application and let it look at the raw data.
  5538. */
  5539. if (onMeta) {
  5540. void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
  5541. if (pRawData == NULL) {
  5542. return DRFLAC_FALSE;
  5543. }
  5544. if (onRead(pUserData, pRawData, blockSize) != blockSize) {
  5545. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5546. return DRFLAC_FALSE;
  5547. }
  5548. metadata.pRawData = pRawData;
  5549. metadata.rawDataSize = blockSize;
  5550. onMeta(pUserDataMD, &metadata);
  5551. drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
  5552. }
  5553. } break;
  5554. }
  5555. /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
  5556. if (onMeta == NULL && blockSize > 0) {
  5557. if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
  5558. isLastBlock = DRFLAC_TRUE;
  5559. }
  5560. }
  5561. runningFilePos += blockSize;
  5562. if (isLastBlock) {
  5563. break;
  5564. }
  5565. }
  5566. *pSeektablePos = seektablePos;
  5567. *pSeektableSize = seektableSize;
  5568. *pFirstFramePos = runningFilePos;
  5569. return DRFLAC_TRUE;
  5570. }
  5571. static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
  5572. {
  5573. /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
  5574. drflac_uint8 isLastBlock;
  5575. drflac_uint8 blockType;
  5576. drflac_uint32 blockSize;
  5577. (void)onSeek;
  5578. pInit->container = drflac_container_native;
  5579. /* The first metadata block should be the STREAMINFO block. */
  5580. if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
  5581. return DRFLAC_FALSE;
  5582. }
  5583. if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
  5584. if (!relaxed) {
  5585. /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
  5586. return DRFLAC_FALSE;
  5587. } else {
  5588. /*
  5589. Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
  5590. for that frame.
  5591. */
  5592. pInit->hasStreamInfoBlock = DRFLAC_FALSE;
  5593. pInit->hasMetadataBlocks = DRFLAC_FALSE;
  5594. if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
  5595. return DRFLAC_FALSE; /* Couldn't find a frame. */
  5596. }
  5597. if (pInit->firstFrameHeader.bitsPerSample == 0) {
  5598. return DRFLAC_FALSE; /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
  5599. }
  5600. pInit->sampleRate = pInit->firstFrameHeader.sampleRate;
  5601. pInit->channels = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
  5602. pInit->bitsPerSample = pInit->firstFrameHeader.bitsPerSample;
  5603. pInit->maxBlockSizeInPCMFrames = 65535; /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
  5604. return DRFLAC_TRUE;
  5605. }
  5606. } else {
  5607. drflac_streaminfo streaminfo;
  5608. if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
  5609. return DRFLAC_FALSE;
  5610. }
  5611. pInit->hasStreamInfoBlock = DRFLAC_TRUE;
  5612. pInit->sampleRate = streaminfo.sampleRate;
  5613. pInit->channels = streaminfo.channels;
  5614. pInit->bitsPerSample = streaminfo.bitsPerSample;
  5615. pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
  5616. pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
  5617. pInit->hasMetadataBlocks = !isLastBlock;
  5618. if (onMeta) {
  5619. drflac_metadata metadata;
  5620. metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
  5621. metadata.pRawData = NULL;
  5622. metadata.rawDataSize = 0;
  5623. metadata.data.streaminfo = streaminfo;
  5624. onMeta(pUserDataMD, &metadata);
  5625. }
  5626. return DRFLAC_TRUE;
  5627. }
  5628. }
  5629. #ifndef DR_FLAC_NO_OGG
  5630. #define DRFLAC_OGG_MAX_PAGE_SIZE 65307
  5631. #define DRFLAC_OGG_CAPTURE_PATTERN_CRC32 1605413199 /* CRC-32 of "OggS". */
  5632. typedef enum
  5633. {
  5634. drflac_ogg_recover_on_crc_mismatch,
  5635. drflac_ogg_fail_on_crc_mismatch
  5636. } drflac_ogg_crc_mismatch_recovery;
  5637. #ifndef DR_FLAC_NO_CRC
  5638. static drflac_uint32 drflac__crc32_table[] = {
  5639. 0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
  5640. 0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
  5641. 0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
  5642. 0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
  5643. 0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
  5644. 0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
  5645. 0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
  5646. 0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
  5647. 0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
  5648. 0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
  5649. 0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
  5650. 0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
  5651. 0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
  5652. 0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
  5653. 0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
  5654. 0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
  5655. 0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
  5656. 0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
  5657. 0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
  5658. 0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
  5659. 0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
  5660. 0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
  5661. 0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
  5662. 0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
  5663. 0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
  5664. 0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
  5665. 0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
  5666. 0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
  5667. 0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
  5668. 0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
  5669. 0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
  5670. 0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
  5671. 0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
  5672. 0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
  5673. 0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
  5674. 0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
  5675. 0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
  5676. 0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
  5677. 0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
  5678. 0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
  5679. 0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
  5680. 0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
  5681. 0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
  5682. 0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
  5683. 0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
  5684. 0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
  5685. 0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
  5686. 0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
  5687. 0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
  5688. 0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
  5689. 0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
  5690. 0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
  5691. 0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
  5692. 0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
  5693. 0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
  5694. 0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
  5695. 0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
  5696. 0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
  5697. 0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
  5698. 0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
  5699. 0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
  5700. 0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
  5701. 0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
  5702. 0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
  5703. };
  5704. #endif
  5705. static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
  5706. {
  5707. #ifndef DR_FLAC_NO_CRC
  5708. return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
  5709. #else
  5710. (void)data;
  5711. return crc32;
  5712. #endif
  5713. }
  5714. #if 0
  5715. static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
  5716. {
  5717. crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
  5718. crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
  5719. crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 8) & 0xFF));
  5720. crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 0) & 0xFF));
  5721. return crc32;
  5722. }
  5723. static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
  5724. {
  5725. crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
  5726. crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 0) & 0xFFFFFFFF));
  5727. return crc32;
  5728. }
  5729. #endif
  5730. static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
  5731. {
  5732. /* This can be optimized. */
  5733. drflac_uint32 i;
  5734. for (i = 0; i < dataSize; ++i) {
  5735. crc32 = drflac_crc32_byte(crc32, pData[i]);
  5736. }
  5737. return crc32;
  5738. }
  5739. static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
  5740. {
  5741. return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
  5742. }
  5743. static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
  5744. {
  5745. return 27 + pHeader->segmentCount;
  5746. }
  5747. static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
  5748. {
  5749. drflac_uint32 pageBodySize = 0;
  5750. int i;
  5751. for (i = 0; i < pHeader->segmentCount; ++i) {
  5752. pageBodySize += pHeader->segmentTable[i];
  5753. }
  5754. return pageBodySize;
  5755. }
  5756. static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
  5757. {
  5758. drflac_uint8 data[23];
  5759. drflac_uint32 i;
  5760. DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
  5761. if (onRead(pUserData, data, 23) != 23) {
  5762. return DRFLAC_AT_END;
  5763. }
  5764. *pBytesRead += 23;
  5765. /*
  5766. It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
  5767. us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
  5768. like to have it map to the structure of the underlying data.
  5769. */
  5770. pHeader->capturePattern[0] = 'O';
  5771. pHeader->capturePattern[1] = 'g';
  5772. pHeader->capturePattern[2] = 'g';
  5773. pHeader->capturePattern[3] = 'S';
  5774. pHeader->structureVersion = data[0];
  5775. pHeader->headerType = data[1];
  5776. DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
  5777. DRFLAC_COPY_MEMORY(&pHeader->serialNumber, &data[10], 4);
  5778. DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber, &data[14], 4);
  5779. DRFLAC_COPY_MEMORY(&pHeader->checksum, &data[18], 4);
  5780. pHeader->segmentCount = data[22];
  5781. /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
  5782. data[18] = 0;
  5783. data[19] = 0;
  5784. data[20] = 0;
  5785. data[21] = 0;
  5786. for (i = 0; i < 23; ++i) {
  5787. *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
  5788. }
  5789. if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
  5790. return DRFLAC_AT_END;
  5791. }
  5792. *pBytesRead += pHeader->segmentCount;
  5793. for (i = 0; i < pHeader->segmentCount; ++i) {
  5794. *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
  5795. }
  5796. return DRFLAC_SUCCESS;
  5797. }
  5798. static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
  5799. {
  5800. drflac_uint8 id[4];
  5801. *pBytesRead = 0;
  5802. if (onRead(pUserData, id, 4) != 4) {
  5803. return DRFLAC_AT_END;
  5804. }
  5805. *pBytesRead += 4;
  5806. /* We need to read byte-by-byte until we find the OggS capture pattern. */
  5807. for (;;) {
  5808. if (drflac_ogg__is_capture_pattern(id)) {
  5809. drflac_result result;
  5810. *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
  5811. result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
  5812. if (result == DRFLAC_SUCCESS) {
  5813. return DRFLAC_SUCCESS;
  5814. } else {
  5815. if (result == DRFLAC_CRC_MISMATCH) {
  5816. continue;
  5817. } else {
  5818. return result;
  5819. }
  5820. }
  5821. } else {
  5822. /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
  5823. id[0] = id[1];
  5824. id[1] = id[2];
  5825. id[2] = id[3];
  5826. if (onRead(pUserData, &id[3], 1) != 1) {
  5827. return DRFLAC_AT_END;
  5828. }
  5829. *pBytesRead += 1;
  5830. }
  5831. }
  5832. }
  5833. /*
  5834. The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
  5835. in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
  5836. in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
  5837. dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
  5838. the physical Ogg bitstream are converted and delivered in native FLAC format.
  5839. */
  5840. typedef struct
  5841. {
  5842. drflac_read_proc onRead; /* The original onRead callback from drflac_open() and family. */
  5843. drflac_seek_proc onSeek; /* The original onSeek callback from drflac_open() and family. */
  5844. void* pUserData; /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
  5845. drflac_uint64 currentBytePos; /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
  5846. drflac_uint64 firstBytePos; /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
  5847. drflac_uint32 serialNumber; /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
  5848. drflac_ogg_page_header bosPageHeader; /* Used for seeking. */
  5849. drflac_ogg_page_header currentPageHeader;
  5850. drflac_uint32 bytesRemainingInPage;
  5851. drflac_uint32 pageDataSize;
  5852. drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
  5853. } drflac_oggbs; /* oggbs = Ogg Bitstream */
  5854. static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
  5855. {
  5856. size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
  5857. oggbs->currentBytePos += bytesActuallyRead;
  5858. return bytesActuallyRead;
  5859. }
  5860. static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
  5861. {
  5862. if (origin == drflac_seek_origin_start) {
  5863. if (offset <= 0x7FFFFFFF) {
  5864. if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
  5865. return DRFLAC_FALSE;
  5866. }
  5867. oggbs->currentBytePos = offset;
  5868. return DRFLAC_TRUE;
  5869. } else {
  5870. if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
  5871. return DRFLAC_FALSE;
  5872. }
  5873. oggbs->currentBytePos = offset;
  5874. return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current);
  5875. }
  5876. } else {
  5877. while (offset > 0x7FFFFFFF) {
  5878. if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
  5879. return DRFLAC_FALSE;
  5880. }
  5881. oggbs->currentBytePos += 0x7FFFFFFF;
  5882. offset -= 0x7FFFFFFF;
  5883. }
  5884. if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) { /* <-- Safe cast thanks to the loop above. */
  5885. return DRFLAC_FALSE;
  5886. }
  5887. oggbs->currentBytePos += offset;
  5888. return DRFLAC_TRUE;
  5889. }
  5890. }
  5891. static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
  5892. {
  5893. drflac_ogg_page_header header;
  5894. for (;;) {
  5895. drflac_uint32 crc32 = 0;
  5896. drflac_uint32 bytesRead;
  5897. drflac_uint32 pageBodySize;
  5898. #ifndef DR_FLAC_NO_CRC
  5899. drflac_uint32 actualCRC32;
  5900. #endif
  5901. if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
  5902. return DRFLAC_FALSE;
  5903. }
  5904. oggbs->currentBytePos += bytesRead;
  5905. pageBodySize = drflac_ogg__get_page_body_size(&header);
  5906. if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
  5907. continue; /* Invalid page size. Assume it's corrupted and just move to the next page. */
  5908. }
  5909. if (header.serialNumber != oggbs->serialNumber) {
  5910. /* It's not a FLAC page. Skip it. */
  5911. if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
  5912. return DRFLAC_FALSE;
  5913. }
  5914. continue;
  5915. }
  5916. /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
  5917. if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
  5918. return DRFLAC_FALSE;
  5919. }
  5920. oggbs->pageDataSize = pageBodySize;
  5921. #ifndef DR_FLAC_NO_CRC
  5922. actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
  5923. if (actualCRC32 != header.checksum) {
  5924. if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
  5925. continue; /* CRC mismatch. Skip this page. */
  5926. } else {
  5927. /*
  5928. Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
  5929. go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
  5930. seek did not fully complete.
  5931. */
  5932. drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
  5933. return DRFLAC_FALSE;
  5934. }
  5935. }
  5936. #else
  5937. (void)recoveryMethod; /* <-- Silence a warning. */
  5938. #endif
  5939. oggbs->currentPageHeader = header;
  5940. oggbs->bytesRemainingInPage = pageBodySize;
  5941. return DRFLAC_TRUE;
  5942. }
  5943. }
  5944. /* Function below is unused at the moment, but I might be re-adding it later. */
  5945. #if 0
  5946. static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
  5947. {
  5948. drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
  5949. drflac_uint8 iSeg = 0;
  5950. drflac_uint32 iByte = 0;
  5951. while (iByte < bytesConsumedInPage) {
  5952. drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
  5953. if (iByte + segmentSize > bytesConsumedInPage) {
  5954. break;
  5955. } else {
  5956. iSeg += 1;
  5957. iByte += segmentSize;
  5958. }
  5959. }
  5960. *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
  5961. return iSeg;
  5962. }
  5963. static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
  5964. {
  5965. /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
  5966. for (;;) {
  5967. drflac_bool32 atEndOfPage = DRFLAC_FALSE;
  5968. drflac_uint8 bytesRemainingInSeg;
  5969. drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
  5970. drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
  5971. for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
  5972. drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
  5973. if (segmentSize < 255) {
  5974. if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
  5975. atEndOfPage = DRFLAC_TRUE;
  5976. }
  5977. break;
  5978. }
  5979. bytesToEndOfPacketOrPage += segmentSize;
  5980. }
  5981. /*
  5982. At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
  5983. want to load the next page and keep searching for the end of the packet.
  5984. */
  5985. drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
  5986. oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
  5987. if (atEndOfPage) {
  5988. /*
  5989. We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
  5990. straddle pages.
  5991. */
  5992. if (!drflac_oggbs__goto_next_page(oggbs)) {
  5993. return DRFLAC_FALSE;
  5994. }
  5995. /* If it's a fresh packet it most likely means we're at the next packet. */
  5996. if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
  5997. return DRFLAC_TRUE;
  5998. }
  5999. } else {
  6000. /* We're at the next packet. */
  6001. return DRFLAC_TRUE;
  6002. }
  6003. }
  6004. }
  6005. static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
  6006. {
  6007. /* The bitstream should be sitting on the first byte just after the header of the frame. */
  6008. /* What we're actually doing here is seeking to the start of the next packet. */
  6009. return drflac_oggbs__seek_to_next_packet(oggbs);
  6010. }
  6011. #endif
  6012. static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
  6013. {
  6014. drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
  6015. drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
  6016. size_t bytesRead = 0;
  6017. DRFLAC_ASSERT(oggbs != NULL);
  6018. DRFLAC_ASSERT(pRunningBufferOut != NULL);
  6019. /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
  6020. while (bytesRead < bytesToRead) {
  6021. size_t bytesRemainingToRead = bytesToRead - bytesRead;
  6022. if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
  6023. DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
  6024. bytesRead += bytesRemainingToRead;
  6025. oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
  6026. break;
  6027. }
  6028. /* If we get here it means some of the requested data is contained in the next pages. */
  6029. if (oggbs->bytesRemainingInPage > 0) {
  6030. DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
  6031. bytesRead += oggbs->bytesRemainingInPage;
  6032. pRunningBufferOut += oggbs->bytesRemainingInPage;
  6033. oggbs->bytesRemainingInPage = 0;
  6034. }
  6035. DRFLAC_ASSERT(bytesRemainingToRead > 0);
  6036. if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
  6037. break; /* Failed to go to the next page. Might have simply hit the end of the stream. */
  6038. }
  6039. }
  6040. return bytesRead;
  6041. }
  6042. static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
  6043. {
  6044. drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
  6045. int bytesSeeked = 0;
  6046. DRFLAC_ASSERT(oggbs != NULL);
  6047. DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
  6048. /* Seeking is always forward which makes things a lot simpler. */
  6049. if (origin == drflac_seek_origin_start) {
  6050. if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
  6051. return DRFLAC_FALSE;
  6052. }
  6053. if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
  6054. return DRFLAC_FALSE;
  6055. }
  6056. return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
  6057. }
  6058. DRFLAC_ASSERT(origin == drflac_seek_origin_current);
  6059. while (bytesSeeked < offset) {
  6060. int bytesRemainingToSeek = offset - bytesSeeked;
  6061. DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
  6062. if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
  6063. bytesSeeked += bytesRemainingToSeek;
  6064. (void)bytesSeeked; /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
  6065. oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
  6066. break;
  6067. }
  6068. /* If we get here it means some of the requested data is contained in the next pages. */
  6069. if (oggbs->bytesRemainingInPage > 0) {
  6070. bytesSeeked += (int)oggbs->bytesRemainingInPage;
  6071. oggbs->bytesRemainingInPage = 0;
  6072. }
  6073. DRFLAC_ASSERT(bytesRemainingToSeek > 0);
  6074. if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
  6075. /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
  6076. return DRFLAC_FALSE;
  6077. }
  6078. }
  6079. return DRFLAC_TRUE;
  6080. }
  6081. static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
  6082. {
  6083. drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
  6084. drflac_uint64 originalBytePos;
  6085. drflac_uint64 runningGranulePosition;
  6086. drflac_uint64 runningFrameBytePos;
  6087. drflac_uint64 runningPCMFrameCount;
  6088. DRFLAC_ASSERT(oggbs != NULL);
  6089. originalBytePos = oggbs->currentBytePos; /* For recovery. Points to the OggS identifier. */
  6090. /* First seek to the first frame. */
  6091. if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
  6092. return DRFLAC_FALSE;
  6093. }
  6094. oggbs->bytesRemainingInPage = 0;
  6095. runningGranulePosition = 0;
  6096. for (;;) {
  6097. if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
  6098. drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
  6099. return DRFLAC_FALSE; /* Never did find that sample... */
  6100. }
  6101. runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
  6102. if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
  6103. break; /* The sample is somewhere in the previous page. */
  6104. }
  6105. /*
  6106. At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
  6107. disregard any pages that do not begin a fresh packet.
  6108. */
  6109. if ((oggbs->currentPageHeader.headerType & 0x01) == 0) { /* <-- Is it a fresh page? */
  6110. if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
  6111. drflac_uint8 firstBytesInPage[2];
  6112. firstBytesInPage[0] = oggbs->pageData[0];
  6113. firstBytesInPage[1] = oggbs->pageData[1];
  6114. if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) { /* <-- Does the page begin with a frame's sync code? */
  6115. runningGranulePosition = oggbs->currentPageHeader.granulePosition;
  6116. }
  6117. continue;
  6118. }
  6119. }
  6120. }
  6121. /*
  6122. We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
  6123. start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
  6124. a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
  6125. we find the one containing the target sample.
  6126. */
  6127. if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
  6128. return DRFLAC_FALSE;
  6129. }
  6130. if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
  6131. return DRFLAC_FALSE;
  6132. }
  6133. /*
  6134. At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
  6135. looping over these frames until we find the one containing the sample we're after.
  6136. */
  6137. runningPCMFrameCount = runningGranulePosition;
  6138. for (;;) {
  6139. /*
  6140. There are two ways to find the sample and seek past irrelevant frames:
  6141. 1) Use the native FLAC decoder.
  6142. 2) Use Ogg's framing system.
  6143. Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
  6144. do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
  6145. duplication for the decoding of frame headers.
  6146. Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
  6147. bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
  6148. standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
  6149. the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
  6150. using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
  6151. avoid the use of the drflac_bs object.
  6152. Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
  6153. 1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
  6154. 2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
  6155. 3) Simplicity.
  6156. */
  6157. drflac_uint64 firstPCMFrameInFLACFrame = 0;
  6158. drflac_uint64 lastPCMFrameInFLACFrame = 0;
  6159. drflac_uint64 pcmFrameCountInThisFrame;
  6160. if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  6161. return DRFLAC_FALSE;
  6162. }
  6163. drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
  6164. pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
  6165. /* If we are seeking to the end of the file and we've just hit it, we're done. */
  6166. if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
  6167. drflac_result result = drflac__decode_flac_frame(pFlac);
  6168. if (result == DRFLAC_SUCCESS) {
  6169. pFlac->currentPCMFrame = pcmFrameIndex;
  6170. pFlac->currentFLACFrame.pcmFramesRemaining = 0;
  6171. return DRFLAC_TRUE;
  6172. } else {
  6173. return DRFLAC_FALSE;
  6174. }
  6175. }
  6176. if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
  6177. /*
  6178. The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
  6179. it never existed and keep iterating.
  6180. */
  6181. drflac_result result = drflac__decode_flac_frame(pFlac);
  6182. if (result == DRFLAC_SUCCESS) {
  6183. /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
  6184. drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount); /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
  6185. if (pcmFramesToDecode == 0) {
  6186. return DRFLAC_TRUE;
  6187. }
  6188. pFlac->currentPCMFrame = runningPCMFrameCount;
  6189. return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
  6190. } else {
  6191. if (result == DRFLAC_CRC_MISMATCH) {
  6192. continue; /* CRC mismatch. Pretend this frame never existed. */
  6193. } else {
  6194. return DRFLAC_FALSE;
  6195. }
  6196. }
  6197. } else {
  6198. /*
  6199. It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
  6200. frame never existed and leave the running sample count untouched.
  6201. */
  6202. drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
  6203. if (result == DRFLAC_SUCCESS) {
  6204. runningPCMFrameCount += pcmFrameCountInThisFrame;
  6205. } else {
  6206. if (result == DRFLAC_CRC_MISMATCH) {
  6207. continue; /* CRC mismatch. Pretend this frame never existed. */
  6208. } else {
  6209. return DRFLAC_FALSE;
  6210. }
  6211. }
  6212. }
  6213. }
  6214. }
  6215. static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
  6216. {
  6217. drflac_ogg_page_header header;
  6218. drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
  6219. drflac_uint32 bytesRead = 0;
  6220. /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
  6221. (void)relaxed;
  6222. pInit->container = drflac_container_ogg;
  6223. pInit->oggFirstBytePos = 0;
  6224. /*
  6225. We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
  6226. stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
  6227. any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
  6228. */
  6229. if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
  6230. return DRFLAC_FALSE;
  6231. }
  6232. pInit->runningFilePos += bytesRead;
  6233. for (;;) {
  6234. int pageBodySize;
  6235. /* Break if we're past the beginning of stream page. */
  6236. if ((header.headerType & 0x02) == 0) {
  6237. return DRFLAC_FALSE;
  6238. }
  6239. /* Check if it's a FLAC header. */
  6240. pageBodySize = drflac_ogg__get_page_body_size(&header);
  6241. if (pageBodySize == 51) { /* 51 = the lacing value of the FLAC header packet. */
  6242. /* It could be a FLAC page... */
  6243. drflac_uint32 bytesRemainingInPage = pageBodySize;
  6244. drflac_uint8 packetType;
  6245. if (onRead(pUserData, &packetType, 1) != 1) {
  6246. return DRFLAC_FALSE;
  6247. }
  6248. bytesRemainingInPage -= 1;
  6249. if (packetType == 0x7F) {
  6250. /* Increasingly more likely to be a FLAC page... */
  6251. drflac_uint8 sig[4];
  6252. if (onRead(pUserData, sig, 4) != 4) {
  6253. return DRFLAC_FALSE;
  6254. }
  6255. bytesRemainingInPage -= 4;
  6256. if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
  6257. /* Almost certainly a FLAC page... */
  6258. drflac_uint8 mappingVersion[2];
  6259. if (onRead(pUserData, mappingVersion, 2) != 2) {
  6260. return DRFLAC_FALSE;
  6261. }
  6262. if (mappingVersion[0] != 1) {
  6263. return DRFLAC_FALSE; /* Only supporting version 1.x of the Ogg mapping. */
  6264. }
  6265. /*
  6266. The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
  6267. be handling it in a generic way based on the serial number and packet types.
  6268. */
  6269. if (!onSeek(pUserData, 2, drflac_seek_origin_current)) {
  6270. return DRFLAC_FALSE;
  6271. }
  6272. /* Expecting the native FLAC signature "fLaC". */
  6273. if (onRead(pUserData, sig, 4) != 4) {
  6274. return DRFLAC_FALSE;
  6275. }
  6276. if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
  6277. /* The remaining data in the page should be the STREAMINFO block. */
  6278. drflac_streaminfo streaminfo;
  6279. drflac_uint8 isLastBlock;
  6280. drflac_uint8 blockType;
  6281. drflac_uint32 blockSize;
  6282. if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
  6283. return DRFLAC_FALSE;
  6284. }
  6285. if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
  6286. return DRFLAC_FALSE; /* Invalid block type. First block must be the STREAMINFO block. */
  6287. }
  6288. if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
  6289. /* Success! */
  6290. pInit->hasStreamInfoBlock = DRFLAC_TRUE;
  6291. pInit->sampleRate = streaminfo.sampleRate;
  6292. pInit->channels = streaminfo.channels;
  6293. pInit->bitsPerSample = streaminfo.bitsPerSample;
  6294. pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
  6295. pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
  6296. pInit->hasMetadataBlocks = !isLastBlock;
  6297. if (onMeta) {
  6298. drflac_metadata metadata;
  6299. metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
  6300. metadata.pRawData = NULL;
  6301. metadata.rawDataSize = 0;
  6302. metadata.data.streaminfo = streaminfo;
  6303. onMeta(pUserDataMD, &metadata);
  6304. }
  6305. pInit->runningFilePos += pageBodySize;
  6306. pInit->oggFirstBytePos = pInit->runningFilePos - 79; /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
  6307. pInit->oggSerial = header.serialNumber;
  6308. pInit->oggBosHeader = header;
  6309. break;
  6310. } else {
  6311. /* Failed to read STREAMINFO block. Aww, so close... */
  6312. return DRFLAC_FALSE;
  6313. }
  6314. } else {
  6315. /* Invalid file. */
  6316. return DRFLAC_FALSE;
  6317. }
  6318. } else {
  6319. /* Not a FLAC header. Skip it. */
  6320. if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
  6321. return DRFLAC_FALSE;
  6322. }
  6323. }
  6324. } else {
  6325. /* Not a FLAC header. Seek past the entire page and move on to the next. */
  6326. if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
  6327. return DRFLAC_FALSE;
  6328. }
  6329. }
  6330. } else {
  6331. if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
  6332. return DRFLAC_FALSE;
  6333. }
  6334. }
  6335. pInit->runningFilePos += pageBodySize;
  6336. /* Read the header of the next page. */
  6337. if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
  6338. return DRFLAC_FALSE;
  6339. }
  6340. pInit->runningFilePos += bytesRead;
  6341. }
  6342. /*
  6343. If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
  6344. packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
  6345. Ogg bistream object.
  6346. */
  6347. pInit->hasMetadataBlocks = DRFLAC_TRUE; /* <-- Always have at least VORBIS_COMMENT metadata block. */
  6348. return DRFLAC_TRUE;
  6349. }
  6350. #endif
  6351. static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
  6352. {
  6353. drflac_bool32 relaxed;
  6354. drflac_uint8 id[4];
  6355. if (pInit == NULL || onRead == NULL || onSeek == NULL) {
  6356. return DRFLAC_FALSE;
  6357. }
  6358. DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
  6359. pInit->onRead = onRead;
  6360. pInit->onSeek = onSeek;
  6361. pInit->onMeta = onMeta;
  6362. pInit->container = container;
  6363. pInit->pUserData = pUserData;
  6364. pInit->pUserDataMD = pUserDataMD;
  6365. pInit->bs.onRead = onRead;
  6366. pInit->bs.onSeek = onSeek;
  6367. pInit->bs.pUserData = pUserData;
  6368. drflac__reset_cache(&pInit->bs);
  6369. /* If the container is explicitly defined then we can try opening in relaxed mode. */
  6370. relaxed = container != drflac_container_unknown;
  6371. /* Skip over any ID3 tags. */
  6372. for (;;) {
  6373. if (onRead(pUserData, id, 4) != 4) {
  6374. return DRFLAC_FALSE; /* Ran out of data. */
  6375. }
  6376. pInit->runningFilePos += 4;
  6377. if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
  6378. drflac_uint8 header[6];
  6379. drflac_uint8 flags;
  6380. drflac_uint32 headerSize;
  6381. if (onRead(pUserData, header, 6) != 6) {
  6382. return DRFLAC_FALSE; /* Ran out of data. */
  6383. }
  6384. pInit->runningFilePos += 6;
  6385. flags = header[1];
  6386. DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
  6387. headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
  6388. if (flags & 0x10) {
  6389. headerSize += 10;
  6390. }
  6391. if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
  6392. return DRFLAC_FALSE; /* Failed to seek past the tag. */
  6393. }
  6394. pInit->runningFilePos += headerSize;
  6395. } else {
  6396. break;
  6397. }
  6398. }
  6399. if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
  6400. return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
  6401. }
  6402. #ifndef DR_FLAC_NO_OGG
  6403. if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
  6404. return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
  6405. }
  6406. #endif
  6407. /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
  6408. if (relaxed) {
  6409. if (container == drflac_container_native) {
  6410. return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
  6411. }
  6412. #ifndef DR_FLAC_NO_OGG
  6413. if (container == drflac_container_ogg) {
  6414. return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
  6415. }
  6416. #endif
  6417. }
  6418. /* Unsupported container. */
  6419. return DRFLAC_FALSE;
  6420. }
  6421. static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
  6422. {
  6423. DRFLAC_ASSERT(pFlac != NULL);
  6424. DRFLAC_ASSERT(pInit != NULL);
  6425. DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
  6426. pFlac->bs = pInit->bs;
  6427. pFlac->onMeta = pInit->onMeta;
  6428. pFlac->pUserDataMD = pInit->pUserDataMD;
  6429. pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
  6430. pFlac->sampleRate = pInit->sampleRate;
  6431. pFlac->channels = (drflac_uint8)pInit->channels;
  6432. pFlac->bitsPerSample = (drflac_uint8)pInit->bitsPerSample;
  6433. pFlac->totalPCMFrameCount = pInit->totalPCMFrameCount;
  6434. pFlac->container = pInit->container;
  6435. }
  6436. static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
  6437. {
  6438. drflac_init_info init;
  6439. drflac_uint32 allocationSize;
  6440. drflac_uint32 wholeSIMDVectorCountPerChannel;
  6441. drflac_uint32 decodedSamplesAllocationSize;
  6442. #ifndef DR_FLAC_NO_OGG
  6443. drflac_oggbs oggbs;
  6444. #endif
  6445. drflac_uint64 firstFramePos;
  6446. drflac_uint64 seektablePos;
  6447. drflac_uint32 seektableSize;
  6448. drflac_allocation_callbacks allocationCallbacks;
  6449. drflac* pFlac;
  6450. /* CPU support first. */
  6451. drflac__init_cpu_caps();
  6452. if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
  6453. return NULL;
  6454. }
  6455. if (pAllocationCallbacks != NULL) {
  6456. allocationCallbacks = *pAllocationCallbacks;
  6457. if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
  6458. return NULL; /* Invalid allocation callbacks. */
  6459. }
  6460. } else {
  6461. allocationCallbacks.pUserData = NULL;
  6462. allocationCallbacks.onMalloc = drflac__malloc_default;
  6463. allocationCallbacks.onRealloc = drflac__realloc_default;
  6464. allocationCallbacks.onFree = drflac__free_default;
  6465. }
  6466. /*
  6467. The size of the allocation for the drflac object needs to be large enough to fit the following:
  6468. 1) The main members of the drflac structure
  6469. 2) A block of memory large enough to store the decoded samples of the largest frame in the stream
  6470. 3) If the container is Ogg, a drflac_oggbs object
  6471. The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
  6472. the different SIMD instruction sets.
  6473. */
  6474. allocationSize = sizeof(drflac);
  6475. /*
  6476. The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
  6477. we are supporting.
  6478. */
  6479. if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
  6480. wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
  6481. } else {
  6482. wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
  6483. }
  6484. decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
  6485. allocationSize += decodedSamplesAllocationSize;
  6486. allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE; /* Allocate extra bytes to ensure we have enough for alignment. */
  6487. #ifndef DR_FLAC_NO_OGG
  6488. /* There's additional data required for Ogg streams. */
  6489. if (init.container == drflac_container_ogg) {
  6490. allocationSize += sizeof(drflac_oggbs);
  6491. }
  6492. DRFLAC_ZERO_MEMORY(&oggbs, sizeof(oggbs));
  6493. if (init.container == drflac_container_ogg) {
  6494. oggbs.onRead = onRead;
  6495. oggbs.onSeek = onSeek;
  6496. oggbs.pUserData = pUserData;
  6497. oggbs.currentBytePos = init.oggFirstBytePos;
  6498. oggbs.firstBytePos = init.oggFirstBytePos;
  6499. oggbs.serialNumber = init.oggSerial;
  6500. oggbs.bosPageHeader = init.oggBosHeader;
  6501. oggbs.bytesRemainingInPage = 0;
  6502. }
  6503. #endif
  6504. /*
  6505. This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
  6506. consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
  6507. and decoding the metadata.
  6508. */
  6509. firstFramePos = 42; /* <-- We know we are at byte 42 at this point. */
  6510. seektablePos = 0;
  6511. seektableSize = 0;
  6512. if (init.hasMetadataBlocks) {
  6513. drflac_read_proc onReadOverride = onRead;
  6514. drflac_seek_proc onSeekOverride = onSeek;
  6515. void* pUserDataOverride = pUserData;
  6516. #ifndef DR_FLAC_NO_OGG
  6517. if (init.container == drflac_container_ogg) {
  6518. onReadOverride = drflac__on_read_ogg;
  6519. onSeekOverride = drflac__on_seek_ogg;
  6520. pUserDataOverride = (void*)&oggbs;
  6521. }
  6522. #endif
  6523. if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seektableSize, &allocationCallbacks)) {
  6524. return NULL;
  6525. }
  6526. allocationSize += seektableSize;
  6527. }
  6528. pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
  6529. if (pFlac == NULL) {
  6530. return NULL;
  6531. }
  6532. drflac__init_from_info(pFlac, &init);
  6533. pFlac->allocationCallbacks = allocationCallbacks;
  6534. pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
  6535. #ifndef DR_FLAC_NO_OGG
  6536. if (init.container == drflac_container_ogg) {
  6537. drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + seektableSize);
  6538. *pInternalOggbs = oggbs;
  6539. /* The Ogg bistream needs to be layered on top of the original bitstream. */
  6540. pFlac->bs.onRead = drflac__on_read_ogg;
  6541. pFlac->bs.onSeek = drflac__on_seek_ogg;
  6542. pFlac->bs.pUserData = (void*)pInternalOggbs;
  6543. pFlac->_oggbs = (void*)pInternalOggbs;
  6544. }
  6545. #endif
  6546. pFlac->firstFLACFramePosInBytes = firstFramePos;
  6547. /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
  6548. #ifndef DR_FLAC_NO_OGG
  6549. if (init.container == drflac_container_ogg)
  6550. {
  6551. pFlac->pSeekpoints = NULL;
  6552. pFlac->seekpointCount = 0;
  6553. }
  6554. else
  6555. #endif
  6556. {
  6557. /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
  6558. if (seektablePos != 0) {
  6559. pFlac->seekpointCount = seektableSize / sizeof(*pFlac->pSeekpoints);
  6560. pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
  6561. DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
  6562. DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
  6563. /* Seek to the seektable, then just read directly into our seektable buffer. */
  6564. if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
  6565. if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints, seektableSize) == seektableSize) {
  6566. /* Endian swap. */
  6567. drflac_uint32 iSeekpoint;
  6568. for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
  6569. pFlac->pSeekpoints[iSeekpoint].firstPCMFrame = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
  6570. pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
  6571. pFlac->pSeekpoints[iSeekpoint].pcmFrameCount = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
  6572. }
  6573. } else {
  6574. /* Failed to read the seektable. Pretend we don't have one. */
  6575. pFlac->pSeekpoints = NULL;
  6576. pFlac->seekpointCount = 0;
  6577. }
  6578. /* We need to seek back to where we were. If this fails it's a critical error. */
  6579. if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
  6580. drflac__free_from_callbacks(pFlac, &allocationCallbacks);
  6581. return NULL;
  6582. }
  6583. } else {
  6584. /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
  6585. pFlac->pSeekpoints = NULL;
  6586. pFlac->seekpointCount = 0;
  6587. }
  6588. }
  6589. }
  6590. /*
  6591. If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
  6592. the first frame.
  6593. */
  6594. if (!init.hasStreamInfoBlock) {
  6595. pFlac->currentFLACFrame.header = init.firstFrameHeader;
  6596. for (;;) {
  6597. drflac_result result = drflac__decode_flac_frame(pFlac);
  6598. if (result == DRFLAC_SUCCESS) {
  6599. break;
  6600. } else {
  6601. if (result == DRFLAC_CRC_MISMATCH) {
  6602. if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
  6603. drflac__free_from_callbacks(pFlac, &allocationCallbacks);
  6604. return NULL;
  6605. }
  6606. continue;
  6607. } else {
  6608. drflac__free_from_callbacks(pFlac, &allocationCallbacks);
  6609. return NULL;
  6610. }
  6611. }
  6612. }
  6613. }
  6614. return pFlac;
  6615. }
  6616. #ifndef DR_FLAC_NO_STDIO
  6617. #include <stdio.h>
  6618. #include <wchar.h> /* For wcslen(), wcsrtombs() */
  6619. /* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
  6620. #include <errno.h>
  6621. static drflac_result drflac_result_from_errno(int e)
  6622. {
  6623. switch (e)
  6624. {
  6625. case 0: return DRFLAC_SUCCESS;
  6626. #ifdef EPERM
  6627. case EPERM: return DRFLAC_INVALID_OPERATION;
  6628. #endif
  6629. #ifdef ENOENT
  6630. case ENOENT: return DRFLAC_DOES_NOT_EXIST;
  6631. #endif
  6632. #ifdef ESRCH
  6633. case ESRCH: return DRFLAC_DOES_NOT_EXIST;
  6634. #endif
  6635. #ifdef EINTR
  6636. case EINTR: return DRFLAC_INTERRUPT;
  6637. #endif
  6638. #ifdef EIO
  6639. case EIO: return DRFLAC_IO_ERROR;
  6640. #endif
  6641. #ifdef ENXIO
  6642. case ENXIO: return DRFLAC_DOES_NOT_EXIST;
  6643. #endif
  6644. #ifdef E2BIG
  6645. case E2BIG: return DRFLAC_INVALID_ARGS;
  6646. #endif
  6647. #ifdef ENOEXEC
  6648. case ENOEXEC: return DRFLAC_INVALID_FILE;
  6649. #endif
  6650. #ifdef EBADF
  6651. case EBADF: return DRFLAC_INVALID_FILE;
  6652. #endif
  6653. #ifdef ECHILD
  6654. case ECHILD: return DRFLAC_ERROR;
  6655. #endif
  6656. #ifdef EAGAIN
  6657. case EAGAIN: return DRFLAC_UNAVAILABLE;
  6658. #endif
  6659. #ifdef ENOMEM
  6660. case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
  6661. #endif
  6662. #ifdef EACCES
  6663. case EACCES: return DRFLAC_ACCESS_DENIED;
  6664. #endif
  6665. #ifdef EFAULT
  6666. case EFAULT: return DRFLAC_BAD_ADDRESS;
  6667. #endif
  6668. #ifdef ENOTBLK
  6669. case ENOTBLK: return DRFLAC_ERROR;
  6670. #endif
  6671. #ifdef EBUSY
  6672. case EBUSY: return DRFLAC_BUSY;
  6673. #endif
  6674. #ifdef EEXIST
  6675. case EEXIST: return DRFLAC_ALREADY_EXISTS;
  6676. #endif
  6677. #ifdef EXDEV
  6678. case EXDEV: return DRFLAC_ERROR;
  6679. #endif
  6680. #ifdef ENODEV
  6681. case ENODEV: return DRFLAC_DOES_NOT_EXIST;
  6682. #endif
  6683. #ifdef ENOTDIR
  6684. case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
  6685. #endif
  6686. #ifdef EISDIR
  6687. case EISDIR: return DRFLAC_IS_DIRECTORY;
  6688. #endif
  6689. #ifdef EINVAL
  6690. case EINVAL: return DRFLAC_INVALID_ARGS;
  6691. #endif
  6692. #ifdef ENFILE
  6693. case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
  6694. #endif
  6695. #ifdef EMFILE
  6696. case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
  6697. #endif
  6698. #ifdef ENOTTY
  6699. case ENOTTY: return DRFLAC_INVALID_OPERATION;
  6700. #endif
  6701. #ifdef ETXTBSY
  6702. case ETXTBSY: return DRFLAC_BUSY;
  6703. #endif
  6704. #ifdef EFBIG
  6705. case EFBIG: return DRFLAC_TOO_BIG;
  6706. #endif
  6707. #ifdef ENOSPC
  6708. case ENOSPC: return DRFLAC_NO_SPACE;
  6709. #endif
  6710. #ifdef ESPIPE
  6711. case ESPIPE: return DRFLAC_BAD_SEEK;
  6712. #endif
  6713. #ifdef EROFS
  6714. case EROFS: return DRFLAC_ACCESS_DENIED;
  6715. #endif
  6716. #ifdef EMLINK
  6717. case EMLINK: return DRFLAC_TOO_MANY_LINKS;
  6718. #endif
  6719. #ifdef EPIPE
  6720. case EPIPE: return DRFLAC_BAD_PIPE;
  6721. #endif
  6722. #ifdef EDOM
  6723. case EDOM: return DRFLAC_OUT_OF_RANGE;
  6724. #endif
  6725. #ifdef ERANGE
  6726. case ERANGE: return DRFLAC_OUT_OF_RANGE;
  6727. #endif
  6728. #ifdef EDEADLK
  6729. case EDEADLK: return DRFLAC_DEADLOCK;
  6730. #endif
  6731. #ifdef ENAMETOOLONG
  6732. case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
  6733. #endif
  6734. #ifdef ENOLCK
  6735. case ENOLCK: return DRFLAC_ERROR;
  6736. #endif
  6737. #ifdef ENOSYS
  6738. case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
  6739. #endif
  6740. #ifdef ENOTEMPTY
  6741. case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
  6742. #endif
  6743. #ifdef ELOOP
  6744. case ELOOP: return DRFLAC_TOO_MANY_LINKS;
  6745. #endif
  6746. #ifdef ENOMSG
  6747. case ENOMSG: return DRFLAC_NO_MESSAGE;
  6748. #endif
  6749. #ifdef EIDRM
  6750. case EIDRM: return DRFLAC_ERROR;
  6751. #endif
  6752. #ifdef ECHRNG
  6753. case ECHRNG: return DRFLAC_ERROR;
  6754. #endif
  6755. #ifdef EL2NSYNC
  6756. case EL2NSYNC: return DRFLAC_ERROR;
  6757. #endif
  6758. #ifdef EL3HLT
  6759. case EL3HLT: return DRFLAC_ERROR;
  6760. #endif
  6761. #ifdef EL3RST
  6762. case EL3RST: return DRFLAC_ERROR;
  6763. #endif
  6764. #ifdef ELNRNG
  6765. case ELNRNG: return DRFLAC_OUT_OF_RANGE;
  6766. #endif
  6767. #ifdef EUNATCH
  6768. case EUNATCH: return DRFLAC_ERROR;
  6769. #endif
  6770. #ifdef ENOCSI
  6771. case ENOCSI: return DRFLAC_ERROR;
  6772. #endif
  6773. #ifdef EL2HLT
  6774. case EL2HLT: return DRFLAC_ERROR;
  6775. #endif
  6776. #ifdef EBADE
  6777. case EBADE: return DRFLAC_ERROR;
  6778. #endif
  6779. #ifdef EBADR
  6780. case EBADR: return DRFLAC_ERROR;
  6781. #endif
  6782. #ifdef EXFULL
  6783. case EXFULL: return DRFLAC_ERROR;
  6784. #endif
  6785. #ifdef ENOANO
  6786. case ENOANO: return DRFLAC_ERROR;
  6787. #endif
  6788. #ifdef EBADRQC
  6789. case EBADRQC: return DRFLAC_ERROR;
  6790. #endif
  6791. #ifdef EBADSLT
  6792. case EBADSLT: return DRFLAC_ERROR;
  6793. #endif
  6794. #ifdef EBFONT
  6795. case EBFONT: return DRFLAC_INVALID_FILE;
  6796. #endif
  6797. #ifdef ENOSTR
  6798. case ENOSTR: return DRFLAC_ERROR;
  6799. #endif
  6800. #ifdef ENODATA
  6801. case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
  6802. #endif
  6803. #ifdef ETIME
  6804. case ETIME: return DRFLAC_TIMEOUT;
  6805. #endif
  6806. #ifdef ENOSR
  6807. case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
  6808. #endif
  6809. #ifdef ENONET
  6810. case ENONET: return DRFLAC_NO_NETWORK;
  6811. #endif
  6812. #ifdef ENOPKG
  6813. case ENOPKG: return DRFLAC_ERROR;
  6814. #endif
  6815. #ifdef EREMOTE
  6816. case EREMOTE: return DRFLAC_ERROR;
  6817. #endif
  6818. #ifdef ENOLINK
  6819. case ENOLINK: return DRFLAC_ERROR;
  6820. #endif
  6821. #ifdef EADV
  6822. case EADV: return DRFLAC_ERROR;
  6823. #endif
  6824. #ifdef ESRMNT
  6825. case ESRMNT: return DRFLAC_ERROR;
  6826. #endif
  6827. #ifdef ECOMM
  6828. case ECOMM: return DRFLAC_ERROR;
  6829. #endif
  6830. #ifdef EPROTO
  6831. case EPROTO: return DRFLAC_ERROR;
  6832. #endif
  6833. #ifdef EMULTIHOP
  6834. case EMULTIHOP: return DRFLAC_ERROR;
  6835. #endif
  6836. #ifdef EDOTDOT
  6837. case EDOTDOT: return DRFLAC_ERROR;
  6838. #endif
  6839. #ifdef EBADMSG
  6840. case EBADMSG: return DRFLAC_BAD_MESSAGE;
  6841. #endif
  6842. #ifdef EOVERFLOW
  6843. case EOVERFLOW: return DRFLAC_TOO_BIG;
  6844. #endif
  6845. #ifdef ENOTUNIQ
  6846. case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
  6847. #endif
  6848. #ifdef EBADFD
  6849. case EBADFD: return DRFLAC_ERROR;
  6850. #endif
  6851. #ifdef EREMCHG
  6852. case EREMCHG: return DRFLAC_ERROR;
  6853. #endif
  6854. #ifdef ELIBACC
  6855. case ELIBACC: return DRFLAC_ACCESS_DENIED;
  6856. #endif
  6857. #ifdef ELIBBAD
  6858. case ELIBBAD: return DRFLAC_INVALID_FILE;
  6859. #endif
  6860. #ifdef ELIBSCN
  6861. case ELIBSCN: return DRFLAC_INVALID_FILE;
  6862. #endif
  6863. #ifdef ELIBMAX
  6864. case ELIBMAX: return DRFLAC_ERROR;
  6865. #endif
  6866. #ifdef ELIBEXEC
  6867. case ELIBEXEC: return DRFLAC_ERROR;
  6868. #endif
  6869. #ifdef EILSEQ
  6870. case EILSEQ: return DRFLAC_INVALID_DATA;
  6871. #endif
  6872. #ifdef ERESTART
  6873. case ERESTART: return DRFLAC_ERROR;
  6874. #endif
  6875. #ifdef ESTRPIPE
  6876. case ESTRPIPE: return DRFLAC_ERROR;
  6877. #endif
  6878. #ifdef EUSERS
  6879. case EUSERS: return DRFLAC_ERROR;
  6880. #endif
  6881. #ifdef ENOTSOCK
  6882. case ENOTSOCK: return DRFLAC_NOT_SOCKET;
  6883. #endif
  6884. #ifdef EDESTADDRREQ
  6885. case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
  6886. #endif
  6887. #ifdef EMSGSIZE
  6888. case EMSGSIZE: return DRFLAC_TOO_BIG;
  6889. #endif
  6890. #ifdef EPROTOTYPE
  6891. case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
  6892. #endif
  6893. #ifdef ENOPROTOOPT
  6894. case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
  6895. #endif
  6896. #ifdef EPROTONOSUPPORT
  6897. case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
  6898. #endif
  6899. #ifdef ESOCKTNOSUPPORT
  6900. case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
  6901. #endif
  6902. #ifdef EOPNOTSUPP
  6903. case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
  6904. #endif
  6905. #ifdef EPFNOSUPPORT
  6906. case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
  6907. #endif
  6908. #ifdef EAFNOSUPPORT
  6909. case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
  6910. #endif
  6911. #ifdef EADDRINUSE
  6912. case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
  6913. #endif
  6914. #ifdef EADDRNOTAVAIL
  6915. case EADDRNOTAVAIL: return DRFLAC_ERROR;
  6916. #endif
  6917. #ifdef ENETDOWN
  6918. case ENETDOWN: return DRFLAC_NO_NETWORK;
  6919. #endif
  6920. #ifdef ENETUNREACH
  6921. case ENETUNREACH: return DRFLAC_NO_NETWORK;
  6922. #endif
  6923. #ifdef ENETRESET
  6924. case ENETRESET: return DRFLAC_NO_NETWORK;
  6925. #endif
  6926. #ifdef ECONNABORTED
  6927. case ECONNABORTED: return DRFLAC_NO_NETWORK;
  6928. #endif
  6929. #ifdef ECONNRESET
  6930. case ECONNRESET: return DRFLAC_CONNECTION_RESET;
  6931. #endif
  6932. #ifdef ENOBUFS
  6933. case ENOBUFS: return DRFLAC_NO_SPACE;
  6934. #endif
  6935. #ifdef EISCONN
  6936. case EISCONN: return DRFLAC_ALREADY_CONNECTED;
  6937. #endif
  6938. #ifdef ENOTCONN
  6939. case ENOTCONN: return DRFLAC_NOT_CONNECTED;
  6940. #endif
  6941. #ifdef ESHUTDOWN
  6942. case ESHUTDOWN: return DRFLAC_ERROR;
  6943. #endif
  6944. #ifdef ETOOMANYREFS
  6945. case ETOOMANYREFS: return DRFLAC_ERROR;
  6946. #endif
  6947. #ifdef ETIMEDOUT
  6948. case ETIMEDOUT: return DRFLAC_TIMEOUT;
  6949. #endif
  6950. #ifdef ECONNREFUSED
  6951. case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
  6952. #endif
  6953. #ifdef EHOSTDOWN
  6954. case EHOSTDOWN: return DRFLAC_NO_HOST;
  6955. #endif
  6956. #ifdef EHOSTUNREACH
  6957. case EHOSTUNREACH: return DRFLAC_NO_HOST;
  6958. #endif
  6959. #ifdef EALREADY
  6960. case EALREADY: return DRFLAC_IN_PROGRESS;
  6961. #endif
  6962. #ifdef EINPROGRESS
  6963. case EINPROGRESS: return DRFLAC_IN_PROGRESS;
  6964. #endif
  6965. #ifdef ESTALE
  6966. case ESTALE: return DRFLAC_INVALID_FILE;
  6967. #endif
  6968. #ifdef EUCLEAN
  6969. case EUCLEAN: return DRFLAC_ERROR;
  6970. #endif
  6971. #ifdef ENOTNAM
  6972. case ENOTNAM: return DRFLAC_ERROR;
  6973. #endif
  6974. #ifdef ENAVAIL
  6975. case ENAVAIL: return DRFLAC_ERROR;
  6976. #endif
  6977. #ifdef EISNAM
  6978. case EISNAM: return DRFLAC_ERROR;
  6979. #endif
  6980. #ifdef EREMOTEIO
  6981. case EREMOTEIO: return DRFLAC_IO_ERROR;
  6982. #endif
  6983. #ifdef EDQUOT
  6984. case EDQUOT: return DRFLAC_NO_SPACE;
  6985. #endif
  6986. #ifdef ENOMEDIUM
  6987. case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
  6988. #endif
  6989. #ifdef EMEDIUMTYPE
  6990. case EMEDIUMTYPE: return DRFLAC_ERROR;
  6991. #endif
  6992. #ifdef ECANCELED
  6993. case ECANCELED: return DRFLAC_CANCELLED;
  6994. #endif
  6995. #ifdef ENOKEY
  6996. case ENOKEY: return DRFLAC_ERROR;
  6997. #endif
  6998. #ifdef EKEYEXPIRED
  6999. case EKEYEXPIRED: return DRFLAC_ERROR;
  7000. #endif
  7001. #ifdef EKEYREVOKED
  7002. case EKEYREVOKED: return DRFLAC_ERROR;
  7003. #endif
  7004. #ifdef EKEYREJECTED
  7005. case EKEYREJECTED: return DRFLAC_ERROR;
  7006. #endif
  7007. #ifdef EOWNERDEAD
  7008. case EOWNERDEAD: return DRFLAC_ERROR;
  7009. #endif
  7010. #ifdef ENOTRECOVERABLE
  7011. case ENOTRECOVERABLE: return DRFLAC_ERROR;
  7012. #endif
  7013. #ifdef ERFKILL
  7014. case ERFKILL: return DRFLAC_ERROR;
  7015. #endif
  7016. #ifdef EHWPOISON
  7017. case EHWPOISON: return DRFLAC_ERROR;
  7018. #endif
  7019. default: return DRFLAC_ERROR;
  7020. }
  7021. }
  7022. static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
  7023. {
  7024. #if defined(_MSC_VER) && _MSC_VER >= 1400
  7025. errno_t err;
  7026. #endif
  7027. if (ppFile != NULL) {
  7028. *ppFile = NULL; /* Safety. */
  7029. }
  7030. if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
  7031. return DRFLAC_INVALID_ARGS;
  7032. }
  7033. #if defined(_MSC_VER) && _MSC_VER >= 1400
  7034. err = fopen_s(ppFile, pFilePath, pOpenMode);
  7035. if (err != 0) {
  7036. return drflac_result_from_errno(err);
  7037. }
  7038. #else
  7039. #if defined(_WIN32) || defined(__APPLE__)
  7040. *ppFile = fopen(pFilePath, pOpenMode);
  7041. #else
  7042. #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
  7043. *ppFile = fopen64(pFilePath, pOpenMode);
  7044. #else
  7045. *ppFile = fopen(pFilePath, pOpenMode);
  7046. #endif
  7047. #endif
  7048. if (*ppFile == NULL) {
  7049. drflac_result result = drflac_result_from_errno(errno);
  7050. if (result == DRFLAC_SUCCESS) {
  7051. result = DRFLAC_ERROR; /* Just a safety check to make sure we never ever return success when pFile == NULL. */
  7052. }
  7053. return result;
  7054. }
  7055. #endif
  7056. return DRFLAC_SUCCESS;
  7057. }
  7058. /*
  7059. _wfopen() isn't always available in all compilation environments.
  7060. * Windows only.
  7061. * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
  7062. * MinGW-64 (both 32- and 64-bit) seems to support it.
  7063. * MinGW wraps it in !defined(__STRICT_ANSI__).
  7064. * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
  7065. This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
  7066. fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
  7067. */
  7068. #if defined(_WIN32)
  7069. #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
  7070. #define DRFLAC_HAS_WFOPEN
  7071. #endif
  7072. #endif
  7073. static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
  7074. {
  7075. if (ppFile != NULL) {
  7076. *ppFile = NULL; /* Safety. */
  7077. }
  7078. if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
  7079. return DRFLAC_INVALID_ARGS;
  7080. }
  7081. #if defined(DRFLAC_HAS_WFOPEN)
  7082. {
  7083. /* Use _wfopen() on Windows. */
  7084. #if defined(_MSC_VER) && _MSC_VER >= 1400
  7085. errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
  7086. if (err != 0) {
  7087. return drflac_result_from_errno(err);
  7088. }
  7089. #else
  7090. *ppFile = _wfopen(pFilePath, pOpenMode);
  7091. if (*ppFile == NULL) {
  7092. return drflac_result_from_errno(errno);
  7093. }
  7094. #endif
  7095. (void)pAllocationCallbacks;
  7096. }
  7097. #else
  7098. /*
  7099. Use fopen() on anything other than Windows. Requires a conversion. This is annoying because fopen() is locale specific. The only real way I can
  7100. think of to do this is with wcsrtombs(). Note that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
  7101. maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler error I'll look into improving compatibility.
  7102. */
  7103. {
  7104. mbstate_t mbs;
  7105. size_t lenMB;
  7106. const wchar_t* pFilePathTemp = pFilePath;
  7107. char* pFilePathMB = NULL;
  7108. char pOpenModeMB[32] = {0};
  7109. /* Get the length first. */
  7110. DRFLAC_ZERO_OBJECT(&mbs);
  7111. lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
  7112. if (lenMB == (size_t)-1) {
  7113. return drflac_result_from_errno(errno);
  7114. }
  7115. pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
  7116. if (pFilePathMB == NULL) {
  7117. return DRFLAC_OUT_OF_MEMORY;
  7118. }
  7119. pFilePathTemp = pFilePath;
  7120. DRFLAC_ZERO_OBJECT(&mbs);
  7121. wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
  7122. /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
  7123. {
  7124. size_t i = 0;
  7125. for (;;) {
  7126. if (pOpenMode[i] == 0) {
  7127. pOpenModeMB[i] = '\0';
  7128. break;
  7129. }
  7130. pOpenModeMB[i] = (char)pOpenMode[i];
  7131. i += 1;
  7132. }
  7133. }
  7134. *ppFile = fopen(pFilePathMB, pOpenModeMB);
  7135. drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
  7136. }
  7137. if (*ppFile == NULL) {
  7138. return DRFLAC_ERROR;
  7139. }
  7140. #endif
  7141. return DRFLAC_SUCCESS;
  7142. }
  7143. static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
  7144. {
  7145. return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
  7146. }
  7147. static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
  7148. {
  7149. DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
  7150. return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
  7151. }
  7152. DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
  7153. {
  7154. drflac* pFlac;
  7155. FILE* pFile;
  7156. if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
  7157. return NULL;
  7158. }
  7159. pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
  7160. if (pFlac == NULL) {
  7161. fclose(pFile);
  7162. return NULL;
  7163. }
  7164. return pFlac;
  7165. }
  7166. DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
  7167. {
  7168. drflac* pFlac;
  7169. FILE* pFile;
  7170. if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
  7171. return NULL;
  7172. }
  7173. pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
  7174. if (pFlac == NULL) {
  7175. fclose(pFile);
  7176. return NULL;
  7177. }
  7178. return pFlac;
  7179. }
  7180. DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
  7181. {
  7182. drflac* pFlac;
  7183. FILE* pFile;
  7184. if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
  7185. return NULL;
  7186. }
  7187. pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
  7188. if (pFlac == NULL) {
  7189. fclose(pFile);
  7190. return pFlac;
  7191. }
  7192. return pFlac;
  7193. }
  7194. DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
  7195. {
  7196. drflac* pFlac;
  7197. FILE* pFile;
  7198. if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
  7199. return NULL;
  7200. }
  7201. pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
  7202. if (pFlac == NULL) {
  7203. fclose(pFile);
  7204. return pFlac;
  7205. }
  7206. return pFlac;
  7207. }
  7208. #endif /* DR_FLAC_NO_STDIO */
  7209. static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
  7210. {
  7211. drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
  7212. size_t bytesRemaining;
  7213. DRFLAC_ASSERT(memoryStream != NULL);
  7214. DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
  7215. bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
  7216. if (bytesToRead > bytesRemaining) {
  7217. bytesToRead = bytesRemaining;
  7218. }
  7219. if (bytesToRead > 0) {
  7220. DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
  7221. memoryStream->currentReadPos += bytesToRead;
  7222. }
  7223. return bytesToRead;
  7224. }
  7225. static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
  7226. {
  7227. drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
  7228. DRFLAC_ASSERT(memoryStream != NULL);
  7229. DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
  7230. if (offset > (drflac_int64)memoryStream->dataSize) {
  7231. return DRFLAC_FALSE;
  7232. }
  7233. if (origin == drflac_seek_origin_current) {
  7234. if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
  7235. memoryStream->currentReadPos += offset;
  7236. } else {
  7237. return DRFLAC_FALSE; /* Trying to seek too far forward. */
  7238. }
  7239. } else {
  7240. if ((drflac_uint32)offset <= memoryStream->dataSize) {
  7241. memoryStream->currentReadPos = offset;
  7242. } else {
  7243. return DRFLAC_FALSE; /* Trying to seek too far forward. */
  7244. }
  7245. }
  7246. return DRFLAC_TRUE;
  7247. }
  7248. DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
  7249. {
  7250. drflac__memory_stream memoryStream;
  7251. drflac* pFlac;
  7252. memoryStream.data = (const drflac_uint8*)pData;
  7253. memoryStream.dataSize = dataSize;
  7254. memoryStream.currentReadPos = 0;
  7255. pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
  7256. if (pFlac == NULL) {
  7257. return NULL;
  7258. }
  7259. pFlac->memoryStream = memoryStream;
  7260. /* This is an awful hack... */
  7261. #ifndef DR_FLAC_NO_OGG
  7262. if (pFlac->container == drflac_container_ogg)
  7263. {
  7264. drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
  7265. oggbs->pUserData = &pFlac->memoryStream;
  7266. }
  7267. else
  7268. #endif
  7269. {
  7270. pFlac->bs.pUserData = &pFlac->memoryStream;
  7271. }
  7272. return pFlac;
  7273. }
  7274. DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
  7275. {
  7276. drflac__memory_stream memoryStream;
  7277. drflac* pFlac;
  7278. memoryStream.data = (const drflac_uint8*)pData;
  7279. memoryStream.dataSize = dataSize;
  7280. memoryStream.currentReadPos = 0;
  7281. pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
  7282. if (pFlac == NULL) {
  7283. return NULL;
  7284. }
  7285. pFlac->memoryStream = memoryStream;
  7286. /* This is an awful hack... */
  7287. #ifndef DR_FLAC_NO_OGG
  7288. if (pFlac->container == drflac_container_ogg)
  7289. {
  7290. drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
  7291. oggbs->pUserData = &pFlac->memoryStream;
  7292. }
  7293. else
  7294. #endif
  7295. {
  7296. pFlac->bs.pUserData = &pFlac->memoryStream;
  7297. }
  7298. return pFlac;
  7299. }
  7300. DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
  7301. {
  7302. return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
  7303. }
  7304. DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
  7305. {
  7306. return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
  7307. }
  7308. DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
  7309. {
  7310. return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
  7311. }
  7312. DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
  7313. {
  7314. return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
  7315. }
  7316. DRFLAC_API void drflac_close(drflac* pFlac)
  7317. {
  7318. if (pFlac == NULL) {
  7319. return;
  7320. }
  7321. #ifndef DR_FLAC_NO_STDIO
  7322. /*
  7323. If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
  7324. was used by looking at the callbacks.
  7325. */
  7326. if (pFlac->bs.onRead == drflac__on_read_stdio) {
  7327. fclose((FILE*)pFlac->bs.pUserData);
  7328. }
  7329. #ifndef DR_FLAC_NO_OGG
  7330. /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
  7331. if (pFlac->container == drflac_container_ogg) {
  7332. drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
  7333. DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
  7334. if (oggbs->onRead == drflac__on_read_stdio) {
  7335. fclose((FILE*)oggbs->pUserData);
  7336. }
  7337. }
  7338. #endif
  7339. #endif
  7340. drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
  7341. }
  7342. #if 0
  7343. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7344. {
  7345. drflac_uint64 i;
  7346. for (i = 0; i < frameCount; ++i) {
  7347. drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  7348. drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  7349. drflac_uint32 right = left - side;
  7350. pOutputSamples[i*2+0] = (drflac_int32)left;
  7351. pOutputSamples[i*2+1] = (drflac_int32)right;
  7352. }
  7353. }
  7354. #endif
  7355. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7356. {
  7357. drflac_uint64 i;
  7358. drflac_uint64 frameCount4 = frameCount >> 2;
  7359. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7360. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7361. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7362. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7363. for (i = 0; i < frameCount4; ++i) {
  7364. drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
  7365. drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
  7366. drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
  7367. drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
  7368. drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
  7369. drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
  7370. drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
  7371. drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
  7372. drflac_uint32 right0 = left0 - side0;
  7373. drflac_uint32 right1 = left1 - side1;
  7374. drflac_uint32 right2 = left2 - side2;
  7375. drflac_uint32 right3 = left3 - side3;
  7376. pOutputSamples[i*8+0] = (drflac_int32)left0;
  7377. pOutputSamples[i*8+1] = (drflac_int32)right0;
  7378. pOutputSamples[i*8+2] = (drflac_int32)left1;
  7379. pOutputSamples[i*8+3] = (drflac_int32)right1;
  7380. pOutputSamples[i*8+4] = (drflac_int32)left2;
  7381. pOutputSamples[i*8+5] = (drflac_int32)right2;
  7382. pOutputSamples[i*8+6] = (drflac_int32)left3;
  7383. pOutputSamples[i*8+7] = (drflac_int32)right3;
  7384. }
  7385. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7386. drflac_uint32 left = pInputSamples0U32[i] << shift0;
  7387. drflac_uint32 side = pInputSamples1U32[i] << shift1;
  7388. drflac_uint32 right = left - side;
  7389. pOutputSamples[i*2+0] = (drflac_int32)left;
  7390. pOutputSamples[i*2+1] = (drflac_int32)right;
  7391. }
  7392. }
  7393. #if defined(DRFLAC_SUPPORT_SSE2)
  7394. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7395. {
  7396. drflac_uint64 i;
  7397. drflac_uint64 frameCount4 = frameCount >> 2;
  7398. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7399. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7400. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7401. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7402. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  7403. for (i = 0; i < frameCount4; ++i) {
  7404. __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  7405. __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  7406. __m128i right = _mm_sub_epi32(left, side);
  7407. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
  7408. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
  7409. }
  7410. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7411. drflac_uint32 left = pInputSamples0U32[i] << shift0;
  7412. drflac_uint32 side = pInputSamples1U32[i] << shift1;
  7413. drflac_uint32 right = left - side;
  7414. pOutputSamples[i*2+0] = (drflac_int32)left;
  7415. pOutputSamples[i*2+1] = (drflac_int32)right;
  7416. }
  7417. }
  7418. #endif
  7419. #if defined(DRFLAC_SUPPORT_NEON)
  7420. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7421. {
  7422. drflac_uint64 i;
  7423. drflac_uint64 frameCount4 = frameCount >> 2;
  7424. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7425. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7426. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7427. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7428. int32x4_t shift0_4;
  7429. int32x4_t shift1_4;
  7430. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  7431. shift0_4 = vdupq_n_s32(shift0);
  7432. shift1_4 = vdupq_n_s32(shift1);
  7433. for (i = 0; i < frameCount4; ++i) {
  7434. uint32x4_t left;
  7435. uint32x4_t side;
  7436. uint32x4_t right;
  7437. left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
  7438. side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
  7439. right = vsubq_u32(left, side);
  7440. drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
  7441. }
  7442. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7443. drflac_uint32 left = pInputSamples0U32[i] << shift0;
  7444. drflac_uint32 side = pInputSamples1U32[i] << shift1;
  7445. drflac_uint32 right = left - side;
  7446. pOutputSamples[i*2+0] = (drflac_int32)left;
  7447. pOutputSamples[i*2+1] = (drflac_int32)right;
  7448. }
  7449. }
  7450. #endif
  7451. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7452. {
  7453. #if defined(DRFLAC_SUPPORT_SSE2)
  7454. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  7455. drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7456. } else
  7457. #elif defined(DRFLAC_SUPPORT_NEON)
  7458. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  7459. drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7460. } else
  7461. #endif
  7462. {
  7463. /* Scalar fallback. */
  7464. #if 0
  7465. drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7466. #else
  7467. drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7468. #endif
  7469. }
  7470. }
  7471. #if 0
  7472. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7473. {
  7474. drflac_uint64 i;
  7475. for (i = 0; i < frameCount; ++i) {
  7476. drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  7477. drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  7478. drflac_uint32 left = right + side;
  7479. pOutputSamples[i*2+0] = (drflac_int32)left;
  7480. pOutputSamples[i*2+1] = (drflac_int32)right;
  7481. }
  7482. }
  7483. #endif
  7484. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7485. {
  7486. drflac_uint64 i;
  7487. drflac_uint64 frameCount4 = frameCount >> 2;
  7488. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7489. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7490. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7491. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7492. for (i = 0; i < frameCount4; ++i) {
  7493. drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
  7494. drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
  7495. drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
  7496. drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
  7497. drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
  7498. drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
  7499. drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
  7500. drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
  7501. drflac_uint32 left0 = right0 + side0;
  7502. drflac_uint32 left1 = right1 + side1;
  7503. drflac_uint32 left2 = right2 + side2;
  7504. drflac_uint32 left3 = right3 + side3;
  7505. pOutputSamples[i*8+0] = (drflac_int32)left0;
  7506. pOutputSamples[i*8+1] = (drflac_int32)right0;
  7507. pOutputSamples[i*8+2] = (drflac_int32)left1;
  7508. pOutputSamples[i*8+3] = (drflac_int32)right1;
  7509. pOutputSamples[i*8+4] = (drflac_int32)left2;
  7510. pOutputSamples[i*8+5] = (drflac_int32)right2;
  7511. pOutputSamples[i*8+6] = (drflac_int32)left3;
  7512. pOutputSamples[i*8+7] = (drflac_int32)right3;
  7513. }
  7514. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7515. drflac_uint32 side = pInputSamples0U32[i] << shift0;
  7516. drflac_uint32 right = pInputSamples1U32[i] << shift1;
  7517. drflac_uint32 left = right + side;
  7518. pOutputSamples[i*2+0] = (drflac_int32)left;
  7519. pOutputSamples[i*2+1] = (drflac_int32)right;
  7520. }
  7521. }
  7522. #if defined(DRFLAC_SUPPORT_SSE2)
  7523. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7524. {
  7525. drflac_uint64 i;
  7526. drflac_uint64 frameCount4 = frameCount >> 2;
  7527. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7528. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7529. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7530. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7531. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  7532. for (i = 0; i < frameCount4; ++i) {
  7533. __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  7534. __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  7535. __m128i left = _mm_add_epi32(right, side);
  7536. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
  7537. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
  7538. }
  7539. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7540. drflac_uint32 side = pInputSamples0U32[i] << shift0;
  7541. drflac_uint32 right = pInputSamples1U32[i] << shift1;
  7542. drflac_uint32 left = right + side;
  7543. pOutputSamples[i*2+0] = (drflac_int32)left;
  7544. pOutputSamples[i*2+1] = (drflac_int32)right;
  7545. }
  7546. }
  7547. #endif
  7548. #if defined(DRFLAC_SUPPORT_NEON)
  7549. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7550. {
  7551. drflac_uint64 i;
  7552. drflac_uint64 frameCount4 = frameCount >> 2;
  7553. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7554. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7555. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7556. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7557. int32x4_t shift0_4;
  7558. int32x4_t shift1_4;
  7559. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  7560. shift0_4 = vdupq_n_s32(shift0);
  7561. shift1_4 = vdupq_n_s32(shift1);
  7562. for (i = 0; i < frameCount4; ++i) {
  7563. uint32x4_t side;
  7564. uint32x4_t right;
  7565. uint32x4_t left;
  7566. side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
  7567. right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
  7568. left = vaddq_u32(right, side);
  7569. drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
  7570. }
  7571. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7572. drflac_uint32 side = pInputSamples0U32[i] << shift0;
  7573. drflac_uint32 right = pInputSamples1U32[i] << shift1;
  7574. drflac_uint32 left = right + side;
  7575. pOutputSamples[i*2+0] = (drflac_int32)left;
  7576. pOutputSamples[i*2+1] = (drflac_int32)right;
  7577. }
  7578. }
  7579. #endif
  7580. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7581. {
  7582. #if defined(DRFLAC_SUPPORT_SSE2)
  7583. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  7584. drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7585. } else
  7586. #elif defined(DRFLAC_SUPPORT_NEON)
  7587. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  7588. drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7589. } else
  7590. #endif
  7591. {
  7592. /* Scalar fallback. */
  7593. #if 0
  7594. drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7595. #else
  7596. drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7597. #endif
  7598. }
  7599. }
  7600. #if 0
  7601. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7602. {
  7603. for (drflac_uint64 i = 0; i < frameCount; ++i) {
  7604. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7605. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7606. mid = (mid << 1) | (side & 0x01);
  7607. pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
  7608. pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
  7609. }
  7610. }
  7611. #endif
  7612. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7613. {
  7614. drflac_uint64 i;
  7615. drflac_uint64 frameCount4 = frameCount >> 2;
  7616. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7617. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7618. drflac_int32 shift = unusedBitsPerSample;
  7619. if (shift > 0) {
  7620. shift -= 1;
  7621. for (i = 0; i < frameCount4; ++i) {
  7622. drflac_uint32 temp0L;
  7623. drflac_uint32 temp1L;
  7624. drflac_uint32 temp2L;
  7625. drflac_uint32 temp3L;
  7626. drflac_uint32 temp0R;
  7627. drflac_uint32 temp1R;
  7628. drflac_uint32 temp2R;
  7629. drflac_uint32 temp3R;
  7630. drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7631. drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7632. drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7633. drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7634. drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7635. drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7636. drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7637. drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7638. mid0 = (mid0 << 1) | (side0 & 0x01);
  7639. mid1 = (mid1 << 1) | (side1 & 0x01);
  7640. mid2 = (mid2 << 1) | (side2 & 0x01);
  7641. mid3 = (mid3 << 1) | (side3 & 0x01);
  7642. temp0L = (mid0 + side0) << shift;
  7643. temp1L = (mid1 + side1) << shift;
  7644. temp2L = (mid2 + side2) << shift;
  7645. temp3L = (mid3 + side3) << shift;
  7646. temp0R = (mid0 - side0) << shift;
  7647. temp1R = (mid1 - side1) << shift;
  7648. temp2R = (mid2 - side2) << shift;
  7649. temp3R = (mid3 - side3) << shift;
  7650. pOutputSamples[i*8+0] = (drflac_int32)temp0L;
  7651. pOutputSamples[i*8+1] = (drflac_int32)temp0R;
  7652. pOutputSamples[i*8+2] = (drflac_int32)temp1L;
  7653. pOutputSamples[i*8+3] = (drflac_int32)temp1R;
  7654. pOutputSamples[i*8+4] = (drflac_int32)temp2L;
  7655. pOutputSamples[i*8+5] = (drflac_int32)temp2R;
  7656. pOutputSamples[i*8+6] = (drflac_int32)temp3L;
  7657. pOutputSamples[i*8+7] = (drflac_int32)temp3R;
  7658. }
  7659. } else {
  7660. for (i = 0; i < frameCount4; ++i) {
  7661. drflac_uint32 temp0L;
  7662. drflac_uint32 temp1L;
  7663. drflac_uint32 temp2L;
  7664. drflac_uint32 temp3L;
  7665. drflac_uint32 temp0R;
  7666. drflac_uint32 temp1R;
  7667. drflac_uint32 temp2R;
  7668. drflac_uint32 temp3R;
  7669. drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7670. drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7671. drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7672. drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7673. drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7674. drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7675. drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7676. drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7677. mid0 = (mid0 << 1) | (side0 & 0x01);
  7678. mid1 = (mid1 << 1) | (side1 & 0x01);
  7679. mid2 = (mid2 << 1) | (side2 & 0x01);
  7680. mid3 = (mid3 << 1) | (side3 & 0x01);
  7681. temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
  7682. temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
  7683. temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
  7684. temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
  7685. temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
  7686. temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
  7687. temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
  7688. temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
  7689. pOutputSamples[i*8+0] = (drflac_int32)temp0L;
  7690. pOutputSamples[i*8+1] = (drflac_int32)temp0R;
  7691. pOutputSamples[i*8+2] = (drflac_int32)temp1L;
  7692. pOutputSamples[i*8+3] = (drflac_int32)temp1R;
  7693. pOutputSamples[i*8+4] = (drflac_int32)temp2L;
  7694. pOutputSamples[i*8+5] = (drflac_int32)temp2R;
  7695. pOutputSamples[i*8+6] = (drflac_int32)temp3L;
  7696. pOutputSamples[i*8+7] = (drflac_int32)temp3R;
  7697. }
  7698. }
  7699. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7700. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7701. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7702. mid = (mid << 1) | (side & 0x01);
  7703. pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
  7704. pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
  7705. }
  7706. }
  7707. #if defined(DRFLAC_SUPPORT_SSE2)
  7708. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7709. {
  7710. drflac_uint64 i;
  7711. drflac_uint64 frameCount4 = frameCount >> 2;
  7712. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7713. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7714. drflac_int32 shift = unusedBitsPerSample;
  7715. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  7716. if (shift == 0) {
  7717. for (i = 0; i < frameCount4; ++i) {
  7718. __m128i mid;
  7719. __m128i side;
  7720. __m128i left;
  7721. __m128i right;
  7722. mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  7723. side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  7724. mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
  7725. left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
  7726. right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
  7727. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
  7728. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
  7729. }
  7730. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7731. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7732. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7733. mid = (mid << 1) | (side & 0x01);
  7734. pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
  7735. pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
  7736. }
  7737. } else {
  7738. shift -= 1;
  7739. for (i = 0; i < frameCount4; ++i) {
  7740. __m128i mid;
  7741. __m128i side;
  7742. __m128i left;
  7743. __m128i right;
  7744. mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  7745. side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  7746. mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
  7747. left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
  7748. right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
  7749. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
  7750. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
  7751. }
  7752. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7753. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7754. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7755. mid = (mid << 1) | (side & 0x01);
  7756. pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
  7757. pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
  7758. }
  7759. }
  7760. }
  7761. #endif
  7762. #if defined(DRFLAC_SUPPORT_NEON)
  7763. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7764. {
  7765. drflac_uint64 i;
  7766. drflac_uint64 frameCount4 = frameCount >> 2;
  7767. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7768. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7769. drflac_int32 shift = unusedBitsPerSample;
  7770. int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
  7771. int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
  7772. uint32x4_t one4;
  7773. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  7774. wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  7775. wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  7776. one4 = vdupq_n_u32(1);
  7777. if (shift == 0) {
  7778. for (i = 0; i < frameCount4; ++i) {
  7779. uint32x4_t mid;
  7780. uint32x4_t side;
  7781. int32x4_t left;
  7782. int32x4_t right;
  7783. mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
  7784. side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
  7785. mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
  7786. left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
  7787. right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
  7788. drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
  7789. }
  7790. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7791. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7792. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7793. mid = (mid << 1) | (side & 0x01);
  7794. pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
  7795. pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
  7796. }
  7797. } else {
  7798. int32x4_t shift4;
  7799. shift -= 1;
  7800. shift4 = vdupq_n_s32(shift);
  7801. for (i = 0; i < frameCount4; ++i) {
  7802. uint32x4_t mid;
  7803. uint32x4_t side;
  7804. int32x4_t left;
  7805. int32x4_t right;
  7806. mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
  7807. side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
  7808. mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
  7809. left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
  7810. right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
  7811. drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
  7812. }
  7813. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7814. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7815. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7816. mid = (mid << 1) | (side & 0x01);
  7817. pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
  7818. pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
  7819. }
  7820. }
  7821. }
  7822. #endif
  7823. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7824. {
  7825. #if defined(DRFLAC_SUPPORT_SSE2)
  7826. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  7827. drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7828. } else
  7829. #elif defined(DRFLAC_SUPPORT_NEON)
  7830. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  7831. drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7832. } else
  7833. #endif
  7834. {
  7835. /* Scalar fallback. */
  7836. #if 0
  7837. drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7838. #else
  7839. drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7840. #endif
  7841. }
  7842. }
  7843. #if 0
  7844. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7845. {
  7846. for (drflac_uint64 i = 0; i < frameCount; ++i) {
  7847. pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
  7848. pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
  7849. }
  7850. }
  7851. #endif
  7852. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7853. {
  7854. drflac_uint64 i;
  7855. drflac_uint64 frameCount4 = frameCount >> 2;
  7856. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7857. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7858. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7859. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7860. for (i = 0; i < frameCount4; ++i) {
  7861. drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
  7862. drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
  7863. drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
  7864. drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
  7865. drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
  7866. drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
  7867. drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
  7868. drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
  7869. pOutputSamples[i*8+0] = (drflac_int32)tempL0;
  7870. pOutputSamples[i*8+1] = (drflac_int32)tempR0;
  7871. pOutputSamples[i*8+2] = (drflac_int32)tempL1;
  7872. pOutputSamples[i*8+3] = (drflac_int32)tempR1;
  7873. pOutputSamples[i*8+4] = (drflac_int32)tempL2;
  7874. pOutputSamples[i*8+5] = (drflac_int32)tempR2;
  7875. pOutputSamples[i*8+6] = (drflac_int32)tempL3;
  7876. pOutputSamples[i*8+7] = (drflac_int32)tempR3;
  7877. }
  7878. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7879. pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
  7880. pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
  7881. }
  7882. }
  7883. #if defined(DRFLAC_SUPPORT_SSE2)
  7884. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7885. {
  7886. drflac_uint64 i;
  7887. drflac_uint64 frameCount4 = frameCount >> 2;
  7888. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7889. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7890. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7891. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7892. for (i = 0; i < frameCount4; ++i) {
  7893. __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  7894. __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  7895. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
  7896. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
  7897. }
  7898. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7899. pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
  7900. pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
  7901. }
  7902. }
  7903. #endif
  7904. #if defined(DRFLAC_SUPPORT_NEON)
  7905. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7906. {
  7907. drflac_uint64 i;
  7908. drflac_uint64 frameCount4 = frameCount >> 2;
  7909. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  7910. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  7911. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  7912. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  7913. int32x4_t shift4_0 = vdupq_n_s32(shift0);
  7914. int32x4_t shift4_1 = vdupq_n_s32(shift1);
  7915. for (i = 0; i < frameCount4; ++i) {
  7916. int32x4_t left;
  7917. int32x4_t right;
  7918. left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
  7919. right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
  7920. drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
  7921. }
  7922. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  7923. pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
  7924. pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
  7925. }
  7926. }
  7927. #endif
  7928. static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
  7929. {
  7930. #if defined(DRFLAC_SUPPORT_SSE2)
  7931. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  7932. drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7933. } else
  7934. #elif defined(DRFLAC_SUPPORT_NEON)
  7935. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  7936. drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7937. } else
  7938. #endif
  7939. {
  7940. /* Scalar fallback. */
  7941. #if 0
  7942. drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7943. #else
  7944. drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  7945. #endif
  7946. }
  7947. }
  7948. DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
  7949. {
  7950. drflac_uint64 framesRead;
  7951. drflac_uint32 unusedBitsPerSample;
  7952. if (pFlac == NULL || framesToRead == 0) {
  7953. return 0;
  7954. }
  7955. if (pBufferOut == NULL) {
  7956. return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
  7957. }
  7958. DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
  7959. unusedBitsPerSample = 32 - pFlac->bitsPerSample;
  7960. framesRead = 0;
  7961. while (framesToRead > 0) {
  7962. /* If we've run out of samples in this frame, go to the next. */
  7963. if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
  7964. if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
  7965. break; /* Couldn't read the next frame, so just break from the loop and return. */
  7966. }
  7967. } else {
  7968. unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
  7969. drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
  7970. drflac_uint64 frameCountThisIteration = framesToRead;
  7971. if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
  7972. frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
  7973. }
  7974. if (channelCount == 2) {
  7975. const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
  7976. const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
  7977. switch (pFlac->currentFLACFrame.header.channelAssignment)
  7978. {
  7979. case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
  7980. {
  7981. drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  7982. } break;
  7983. case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
  7984. {
  7985. drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  7986. } break;
  7987. case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
  7988. {
  7989. drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  7990. } break;
  7991. case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
  7992. default:
  7993. {
  7994. drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  7995. } break;
  7996. }
  7997. } else {
  7998. /* Generic interleaving. */
  7999. drflac_uint64 i;
  8000. for (i = 0; i < frameCountThisIteration; ++i) {
  8001. unsigned int j;
  8002. for (j = 0; j < channelCount; ++j) {
  8003. pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
  8004. }
  8005. }
  8006. }
  8007. framesRead += frameCountThisIteration;
  8008. pBufferOut += frameCountThisIteration * channelCount;
  8009. framesToRead -= frameCountThisIteration;
  8010. pFlac->currentPCMFrame += frameCountThisIteration;
  8011. pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
  8012. }
  8013. }
  8014. return framesRead;
  8015. }
  8016. #if 0
  8017. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8018. {
  8019. drflac_uint64 i;
  8020. for (i = 0; i < frameCount; ++i) {
  8021. drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  8022. drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  8023. drflac_uint32 right = left - side;
  8024. left >>= 16;
  8025. right >>= 16;
  8026. pOutputSamples[i*2+0] = (drflac_int16)left;
  8027. pOutputSamples[i*2+1] = (drflac_int16)right;
  8028. }
  8029. }
  8030. #endif
  8031. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8032. {
  8033. drflac_uint64 i;
  8034. drflac_uint64 frameCount4 = frameCount >> 2;
  8035. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8036. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8037. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8038. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8039. for (i = 0; i < frameCount4; ++i) {
  8040. drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
  8041. drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
  8042. drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
  8043. drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
  8044. drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
  8045. drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
  8046. drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
  8047. drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
  8048. drflac_uint32 right0 = left0 - side0;
  8049. drflac_uint32 right1 = left1 - side1;
  8050. drflac_uint32 right2 = left2 - side2;
  8051. drflac_uint32 right3 = left3 - side3;
  8052. left0 >>= 16;
  8053. left1 >>= 16;
  8054. left2 >>= 16;
  8055. left3 >>= 16;
  8056. right0 >>= 16;
  8057. right1 >>= 16;
  8058. right2 >>= 16;
  8059. right3 >>= 16;
  8060. pOutputSamples[i*8+0] = (drflac_int16)left0;
  8061. pOutputSamples[i*8+1] = (drflac_int16)right0;
  8062. pOutputSamples[i*8+2] = (drflac_int16)left1;
  8063. pOutputSamples[i*8+3] = (drflac_int16)right1;
  8064. pOutputSamples[i*8+4] = (drflac_int16)left2;
  8065. pOutputSamples[i*8+5] = (drflac_int16)right2;
  8066. pOutputSamples[i*8+6] = (drflac_int16)left3;
  8067. pOutputSamples[i*8+7] = (drflac_int16)right3;
  8068. }
  8069. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8070. drflac_uint32 left = pInputSamples0U32[i] << shift0;
  8071. drflac_uint32 side = pInputSamples1U32[i] << shift1;
  8072. drflac_uint32 right = left - side;
  8073. left >>= 16;
  8074. right >>= 16;
  8075. pOutputSamples[i*2+0] = (drflac_int16)left;
  8076. pOutputSamples[i*2+1] = (drflac_int16)right;
  8077. }
  8078. }
  8079. #if defined(DRFLAC_SUPPORT_SSE2)
  8080. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8081. {
  8082. drflac_uint64 i;
  8083. drflac_uint64 frameCount4 = frameCount >> 2;
  8084. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8085. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8086. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8087. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8088. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  8089. for (i = 0; i < frameCount4; ++i) {
  8090. __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  8091. __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  8092. __m128i right = _mm_sub_epi32(left, side);
  8093. left = _mm_srai_epi32(left, 16);
  8094. right = _mm_srai_epi32(right, 16);
  8095. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
  8096. }
  8097. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8098. drflac_uint32 left = pInputSamples0U32[i] << shift0;
  8099. drflac_uint32 side = pInputSamples1U32[i] << shift1;
  8100. drflac_uint32 right = left - side;
  8101. left >>= 16;
  8102. right >>= 16;
  8103. pOutputSamples[i*2+0] = (drflac_int16)left;
  8104. pOutputSamples[i*2+1] = (drflac_int16)right;
  8105. }
  8106. }
  8107. #endif
  8108. #if defined(DRFLAC_SUPPORT_NEON)
  8109. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8110. {
  8111. drflac_uint64 i;
  8112. drflac_uint64 frameCount4 = frameCount >> 2;
  8113. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8114. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8115. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8116. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8117. int32x4_t shift0_4;
  8118. int32x4_t shift1_4;
  8119. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  8120. shift0_4 = vdupq_n_s32(shift0);
  8121. shift1_4 = vdupq_n_s32(shift1);
  8122. for (i = 0; i < frameCount4; ++i) {
  8123. uint32x4_t left;
  8124. uint32x4_t side;
  8125. uint32x4_t right;
  8126. left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
  8127. side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
  8128. right = vsubq_u32(left, side);
  8129. left = vshrq_n_u32(left, 16);
  8130. right = vshrq_n_u32(right, 16);
  8131. drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
  8132. }
  8133. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8134. drflac_uint32 left = pInputSamples0U32[i] << shift0;
  8135. drflac_uint32 side = pInputSamples1U32[i] << shift1;
  8136. drflac_uint32 right = left - side;
  8137. left >>= 16;
  8138. right >>= 16;
  8139. pOutputSamples[i*2+0] = (drflac_int16)left;
  8140. pOutputSamples[i*2+1] = (drflac_int16)right;
  8141. }
  8142. }
  8143. #endif
  8144. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8145. {
  8146. #if defined(DRFLAC_SUPPORT_SSE2)
  8147. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  8148. drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8149. } else
  8150. #elif defined(DRFLAC_SUPPORT_NEON)
  8151. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  8152. drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8153. } else
  8154. #endif
  8155. {
  8156. /* Scalar fallback. */
  8157. #if 0
  8158. drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8159. #else
  8160. drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8161. #endif
  8162. }
  8163. }
  8164. #if 0
  8165. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8166. {
  8167. drflac_uint64 i;
  8168. for (i = 0; i < frameCount; ++i) {
  8169. drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  8170. drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  8171. drflac_uint32 left = right + side;
  8172. left >>= 16;
  8173. right >>= 16;
  8174. pOutputSamples[i*2+0] = (drflac_int16)left;
  8175. pOutputSamples[i*2+1] = (drflac_int16)right;
  8176. }
  8177. }
  8178. #endif
  8179. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8180. {
  8181. drflac_uint64 i;
  8182. drflac_uint64 frameCount4 = frameCount >> 2;
  8183. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8184. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8185. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8186. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8187. for (i = 0; i < frameCount4; ++i) {
  8188. drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
  8189. drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
  8190. drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
  8191. drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
  8192. drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
  8193. drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
  8194. drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
  8195. drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
  8196. drflac_uint32 left0 = right0 + side0;
  8197. drflac_uint32 left1 = right1 + side1;
  8198. drflac_uint32 left2 = right2 + side2;
  8199. drflac_uint32 left3 = right3 + side3;
  8200. left0 >>= 16;
  8201. left1 >>= 16;
  8202. left2 >>= 16;
  8203. left3 >>= 16;
  8204. right0 >>= 16;
  8205. right1 >>= 16;
  8206. right2 >>= 16;
  8207. right3 >>= 16;
  8208. pOutputSamples[i*8+0] = (drflac_int16)left0;
  8209. pOutputSamples[i*8+1] = (drflac_int16)right0;
  8210. pOutputSamples[i*8+2] = (drflac_int16)left1;
  8211. pOutputSamples[i*8+3] = (drflac_int16)right1;
  8212. pOutputSamples[i*8+4] = (drflac_int16)left2;
  8213. pOutputSamples[i*8+5] = (drflac_int16)right2;
  8214. pOutputSamples[i*8+6] = (drflac_int16)left3;
  8215. pOutputSamples[i*8+7] = (drflac_int16)right3;
  8216. }
  8217. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8218. drflac_uint32 side = pInputSamples0U32[i] << shift0;
  8219. drflac_uint32 right = pInputSamples1U32[i] << shift1;
  8220. drflac_uint32 left = right + side;
  8221. left >>= 16;
  8222. right >>= 16;
  8223. pOutputSamples[i*2+0] = (drflac_int16)left;
  8224. pOutputSamples[i*2+1] = (drflac_int16)right;
  8225. }
  8226. }
  8227. #if defined(DRFLAC_SUPPORT_SSE2)
  8228. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8229. {
  8230. drflac_uint64 i;
  8231. drflac_uint64 frameCount4 = frameCount >> 2;
  8232. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8233. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8234. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8235. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8236. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  8237. for (i = 0; i < frameCount4; ++i) {
  8238. __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  8239. __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  8240. __m128i left = _mm_add_epi32(right, side);
  8241. left = _mm_srai_epi32(left, 16);
  8242. right = _mm_srai_epi32(right, 16);
  8243. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
  8244. }
  8245. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8246. drflac_uint32 side = pInputSamples0U32[i] << shift0;
  8247. drflac_uint32 right = pInputSamples1U32[i] << shift1;
  8248. drflac_uint32 left = right + side;
  8249. left >>= 16;
  8250. right >>= 16;
  8251. pOutputSamples[i*2+0] = (drflac_int16)left;
  8252. pOutputSamples[i*2+1] = (drflac_int16)right;
  8253. }
  8254. }
  8255. #endif
  8256. #if defined(DRFLAC_SUPPORT_NEON)
  8257. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8258. {
  8259. drflac_uint64 i;
  8260. drflac_uint64 frameCount4 = frameCount >> 2;
  8261. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8262. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8263. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8264. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8265. int32x4_t shift0_4;
  8266. int32x4_t shift1_4;
  8267. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  8268. shift0_4 = vdupq_n_s32(shift0);
  8269. shift1_4 = vdupq_n_s32(shift1);
  8270. for (i = 0; i < frameCount4; ++i) {
  8271. uint32x4_t side;
  8272. uint32x4_t right;
  8273. uint32x4_t left;
  8274. side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
  8275. right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
  8276. left = vaddq_u32(right, side);
  8277. left = vshrq_n_u32(left, 16);
  8278. right = vshrq_n_u32(right, 16);
  8279. drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
  8280. }
  8281. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8282. drflac_uint32 side = pInputSamples0U32[i] << shift0;
  8283. drflac_uint32 right = pInputSamples1U32[i] << shift1;
  8284. drflac_uint32 left = right + side;
  8285. left >>= 16;
  8286. right >>= 16;
  8287. pOutputSamples[i*2+0] = (drflac_int16)left;
  8288. pOutputSamples[i*2+1] = (drflac_int16)right;
  8289. }
  8290. }
  8291. #endif
  8292. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8293. {
  8294. #if defined(DRFLAC_SUPPORT_SSE2)
  8295. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  8296. drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8297. } else
  8298. #elif defined(DRFLAC_SUPPORT_NEON)
  8299. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  8300. drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8301. } else
  8302. #endif
  8303. {
  8304. /* Scalar fallback. */
  8305. #if 0
  8306. drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8307. #else
  8308. drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8309. #endif
  8310. }
  8311. }
  8312. #if 0
  8313. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8314. {
  8315. for (drflac_uint64 i = 0; i < frameCount; ++i) {
  8316. drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8317. drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8318. mid = (mid << 1) | (side & 0x01);
  8319. pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
  8320. pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
  8321. }
  8322. }
  8323. #endif
  8324. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8325. {
  8326. drflac_uint64 i;
  8327. drflac_uint64 frameCount4 = frameCount >> 2;
  8328. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8329. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8330. drflac_uint32 shift = unusedBitsPerSample;
  8331. if (shift > 0) {
  8332. shift -= 1;
  8333. for (i = 0; i < frameCount4; ++i) {
  8334. drflac_uint32 temp0L;
  8335. drflac_uint32 temp1L;
  8336. drflac_uint32 temp2L;
  8337. drflac_uint32 temp3L;
  8338. drflac_uint32 temp0R;
  8339. drflac_uint32 temp1R;
  8340. drflac_uint32 temp2R;
  8341. drflac_uint32 temp3R;
  8342. drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8343. drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8344. drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8345. drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8346. drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8347. drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8348. drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8349. drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8350. mid0 = (mid0 << 1) | (side0 & 0x01);
  8351. mid1 = (mid1 << 1) | (side1 & 0x01);
  8352. mid2 = (mid2 << 1) | (side2 & 0x01);
  8353. mid3 = (mid3 << 1) | (side3 & 0x01);
  8354. temp0L = (mid0 + side0) << shift;
  8355. temp1L = (mid1 + side1) << shift;
  8356. temp2L = (mid2 + side2) << shift;
  8357. temp3L = (mid3 + side3) << shift;
  8358. temp0R = (mid0 - side0) << shift;
  8359. temp1R = (mid1 - side1) << shift;
  8360. temp2R = (mid2 - side2) << shift;
  8361. temp3R = (mid3 - side3) << shift;
  8362. temp0L >>= 16;
  8363. temp1L >>= 16;
  8364. temp2L >>= 16;
  8365. temp3L >>= 16;
  8366. temp0R >>= 16;
  8367. temp1R >>= 16;
  8368. temp2R >>= 16;
  8369. temp3R >>= 16;
  8370. pOutputSamples[i*8+0] = (drflac_int16)temp0L;
  8371. pOutputSamples[i*8+1] = (drflac_int16)temp0R;
  8372. pOutputSamples[i*8+2] = (drflac_int16)temp1L;
  8373. pOutputSamples[i*8+3] = (drflac_int16)temp1R;
  8374. pOutputSamples[i*8+4] = (drflac_int16)temp2L;
  8375. pOutputSamples[i*8+5] = (drflac_int16)temp2R;
  8376. pOutputSamples[i*8+6] = (drflac_int16)temp3L;
  8377. pOutputSamples[i*8+7] = (drflac_int16)temp3R;
  8378. }
  8379. } else {
  8380. for (i = 0; i < frameCount4; ++i) {
  8381. drflac_uint32 temp0L;
  8382. drflac_uint32 temp1L;
  8383. drflac_uint32 temp2L;
  8384. drflac_uint32 temp3L;
  8385. drflac_uint32 temp0R;
  8386. drflac_uint32 temp1R;
  8387. drflac_uint32 temp2R;
  8388. drflac_uint32 temp3R;
  8389. drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8390. drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8391. drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8392. drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8393. drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8394. drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8395. drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8396. drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8397. mid0 = (mid0 << 1) | (side0 & 0x01);
  8398. mid1 = (mid1 << 1) | (side1 & 0x01);
  8399. mid2 = (mid2 << 1) | (side2 & 0x01);
  8400. mid3 = (mid3 << 1) | (side3 & 0x01);
  8401. temp0L = ((drflac_int32)(mid0 + side0) >> 1);
  8402. temp1L = ((drflac_int32)(mid1 + side1) >> 1);
  8403. temp2L = ((drflac_int32)(mid2 + side2) >> 1);
  8404. temp3L = ((drflac_int32)(mid3 + side3) >> 1);
  8405. temp0R = ((drflac_int32)(mid0 - side0) >> 1);
  8406. temp1R = ((drflac_int32)(mid1 - side1) >> 1);
  8407. temp2R = ((drflac_int32)(mid2 - side2) >> 1);
  8408. temp3R = ((drflac_int32)(mid3 - side3) >> 1);
  8409. temp0L >>= 16;
  8410. temp1L >>= 16;
  8411. temp2L >>= 16;
  8412. temp3L >>= 16;
  8413. temp0R >>= 16;
  8414. temp1R >>= 16;
  8415. temp2R >>= 16;
  8416. temp3R >>= 16;
  8417. pOutputSamples[i*8+0] = (drflac_int16)temp0L;
  8418. pOutputSamples[i*8+1] = (drflac_int16)temp0R;
  8419. pOutputSamples[i*8+2] = (drflac_int16)temp1L;
  8420. pOutputSamples[i*8+3] = (drflac_int16)temp1R;
  8421. pOutputSamples[i*8+4] = (drflac_int16)temp2L;
  8422. pOutputSamples[i*8+5] = (drflac_int16)temp2R;
  8423. pOutputSamples[i*8+6] = (drflac_int16)temp3L;
  8424. pOutputSamples[i*8+7] = (drflac_int16)temp3R;
  8425. }
  8426. }
  8427. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8428. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8429. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8430. mid = (mid << 1) | (side & 0x01);
  8431. pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
  8432. pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
  8433. }
  8434. }
  8435. #if defined(DRFLAC_SUPPORT_SSE2)
  8436. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8437. {
  8438. drflac_uint64 i;
  8439. drflac_uint64 frameCount4 = frameCount >> 2;
  8440. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8441. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8442. drflac_uint32 shift = unusedBitsPerSample;
  8443. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  8444. if (shift == 0) {
  8445. for (i = 0; i < frameCount4; ++i) {
  8446. __m128i mid;
  8447. __m128i side;
  8448. __m128i left;
  8449. __m128i right;
  8450. mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  8451. side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  8452. mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
  8453. left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
  8454. right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
  8455. left = _mm_srai_epi32(left, 16);
  8456. right = _mm_srai_epi32(right, 16);
  8457. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
  8458. }
  8459. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8460. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8461. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8462. mid = (mid << 1) | (side & 0x01);
  8463. pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
  8464. pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
  8465. }
  8466. } else {
  8467. shift -= 1;
  8468. for (i = 0; i < frameCount4; ++i) {
  8469. __m128i mid;
  8470. __m128i side;
  8471. __m128i left;
  8472. __m128i right;
  8473. mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  8474. side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  8475. mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
  8476. left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
  8477. right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
  8478. left = _mm_srai_epi32(left, 16);
  8479. right = _mm_srai_epi32(right, 16);
  8480. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
  8481. }
  8482. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8483. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8484. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8485. mid = (mid << 1) | (side & 0x01);
  8486. pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
  8487. pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
  8488. }
  8489. }
  8490. }
  8491. #endif
  8492. #if defined(DRFLAC_SUPPORT_NEON)
  8493. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8494. {
  8495. drflac_uint64 i;
  8496. drflac_uint64 frameCount4 = frameCount >> 2;
  8497. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8498. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8499. drflac_uint32 shift = unusedBitsPerSample;
  8500. int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
  8501. int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
  8502. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  8503. wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  8504. wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  8505. if (shift == 0) {
  8506. for (i = 0; i < frameCount4; ++i) {
  8507. uint32x4_t mid;
  8508. uint32x4_t side;
  8509. int32x4_t left;
  8510. int32x4_t right;
  8511. mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
  8512. side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
  8513. mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
  8514. left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
  8515. right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
  8516. left = vshrq_n_s32(left, 16);
  8517. right = vshrq_n_s32(right, 16);
  8518. drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
  8519. }
  8520. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8521. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8522. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8523. mid = (mid << 1) | (side & 0x01);
  8524. pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
  8525. pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
  8526. }
  8527. } else {
  8528. int32x4_t shift4;
  8529. shift -= 1;
  8530. shift4 = vdupq_n_s32(shift);
  8531. for (i = 0; i < frameCount4; ++i) {
  8532. uint32x4_t mid;
  8533. uint32x4_t side;
  8534. int32x4_t left;
  8535. int32x4_t right;
  8536. mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
  8537. side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
  8538. mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
  8539. left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
  8540. right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
  8541. left = vshrq_n_s32(left, 16);
  8542. right = vshrq_n_s32(right, 16);
  8543. drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
  8544. }
  8545. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8546. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8547. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8548. mid = (mid << 1) | (side & 0x01);
  8549. pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
  8550. pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
  8551. }
  8552. }
  8553. }
  8554. #endif
  8555. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8556. {
  8557. #if defined(DRFLAC_SUPPORT_SSE2)
  8558. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  8559. drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8560. } else
  8561. #elif defined(DRFLAC_SUPPORT_NEON)
  8562. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  8563. drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8564. } else
  8565. #endif
  8566. {
  8567. /* Scalar fallback. */
  8568. #if 0
  8569. drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8570. #else
  8571. drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8572. #endif
  8573. }
  8574. }
  8575. #if 0
  8576. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8577. {
  8578. for (drflac_uint64 i = 0; i < frameCount; ++i) {
  8579. pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
  8580. pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
  8581. }
  8582. }
  8583. #endif
  8584. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8585. {
  8586. drflac_uint64 i;
  8587. drflac_uint64 frameCount4 = frameCount >> 2;
  8588. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8589. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8590. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8591. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8592. for (i = 0; i < frameCount4; ++i) {
  8593. drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
  8594. drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
  8595. drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
  8596. drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
  8597. drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
  8598. drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
  8599. drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
  8600. drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
  8601. tempL0 >>= 16;
  8602. tempL1 >>= 16;
  8603. tempL2 >>= 16;
  8604. tempL3 >>= 16;
  8605. tempR0 >>= 16;
  8606. tempR1 >>= 16;
  8607. tempR2 >>= 16;
  8608. tempR3 >>= 16;
  8609. pOutputSamples[i*8+0] = (drflac_int16)tempL0;
  8610. pOutputSamples[i*8+1] = (drflac_int16)tempR0;
  8611. pOutputSamples[i*8+2] = (drflac_int16)tempL1;
  8612. pOutputSamples[i*8+3] = (drflac_int16)tempR1;
  8613. pOutputSamples[i*8+4] = (drflac_int16)tempL2;
  8614. pOutputSamples[i*8+5] = (drflac_int16)tempR2;
  8615. pOutputSamples[i*8+6] = (drflac_int16)tempL3;
  8616. pOutputSamples[i*8+7] = (drflac_int16)tempR3;
  8617. }
  8618. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8619. pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
  8620. pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
  8621. }
  8622. }
  8623. #if defined(DRFLAC_SUPPORT_SSE2)
  8624. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8625. {
  8626. drflac_uint64 i;
  8627. drflac_uint64 frameCount4 = frameCount >> 2;
  8628. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8629. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8630. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8631. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8632. for (i = 0; i < frameCount4; ++i) {
  8633. __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  8634. __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  8635. left = _mm_srai_epi32(left, 16);
  8636. right = _mm_srai_epi32(right, 16);
  8637. /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
  8638. _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
  8639. }
  8640. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8641. pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
  8642. pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
  8643. }
  8644. }
  8645. #endif
  8646. #if defined(DRFLAC_SUPPORT_NEON)
  8647. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8648. {
  8649. drflac_uint64 i;
  8650. drflac_uint64 frameCount4 = frameCount >> 2;
  8651. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8652. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8653. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8654. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8655. int32x4_t shift0_4 = vdupq_n_s32(shift0);
  8656. int32x4_t shift1_4 = vdupq_n_s32(shift1);
  8657. for (i = 0; i < frameCount4; ++i) {
  8658. int32x4_t left;
  8659. int32x4_t right;
  8660. left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
  8661. right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
  8662. left = vshrq_n_s32(left, 16);
  8663. right = vshrq_n_s32(right, 16);
  8664. drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
  8665. }
  8666. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8667. pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
  8668. pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
  8669. }
  8670. }
  8671. #endif
  8672. static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
  8673. {
  8674. #if defined(DRFLAC_SUPPORT_SSE2)
  8675. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  8676. drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8677. } else
  8678. #elif defined(DRFLAC_SUPPORT_NEON)
  8679. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  8680. drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8681. } else
  8682. #endif
  8683. {
  8684. /* Scalar fallback. */
  8685. #if 0
  8686. drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8687. #else
  8688. drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8689. #endif
  8690. }
  8691. }
  8692. DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
  8693. {
  8694. drflac_uint64 framesRead;
  8695. drflac_uint32 unusedBitsPerSample;
  8696. if (pFlac == NULL || framesToRead == 0) {
  8697. return 0;
  8698. }
  8699. if (pBufferOut == NULL) {
  8700. return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
  8701. }
  8702. DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
  8703. unusedBitsPerSample = 32 - pFlac->bitsPerSample;
  8704. framesRead = 0;
  8705. while (framesToRead > 0) {
  8706. /* If we've run out of samples in this frame, go to the next. */
  8707. if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
  8708. if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
  8709. break; /* Couldn't read the next frame, so just break from the loop and return. */
  8710. }
  8711. } else {
  8712. unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
  8713. drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
  8714. drflac_uint64 frameCountThisIteration = framesToRead;
  8715. if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
  8716. frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
  8717. }
  8718. if (channelCount == 2) {
  8719. const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
  8720. const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
  8721. switch (pFlac->currentFLACFrame.header.channelAssignment)
  8722. {
  8723. case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
  8724. {
  8725. drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  8726. } break;
  8727. case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
  8728. {
  8729. drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  8730. } break;
  8731. case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
  8732. {
  8733. drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  8734. } break;
  8735. case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
  8736. default:
  8737. {
  8738. drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  8739. } break;
  8740. }
  8741. } else {
  8742. /* Generic interleaving. */
  8743. drflac_uint64 i;
  8744. for (i = 0; i < frameCountThisIteration; ++i) {
  8745. unsigned int j;
  8746. for (j = 0; j < channelCount; ++j) {
  8747. drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
  8748. pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
  8749. }
  8750. }
  8751. }
  8752. framesRead += frameCountThisIteration;
  8753. pBufferOut += frameCountThisIteration * channelCount;
  8754. framesToRead -= frameCountThisIteration;
  8755. pFlac->currentPCMFrame += frameCountThisIteration;
  8756. pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
  8757. }
  8758. }
  8759. return framesRead;
  8760. }
  8761. #if 0
  8762. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  8763. {
  8764. drflac_uint64 i;
  8765. for (i = 0; i < frameCount; ++i) {
  8766. drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  8767. drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  8768. drflac_uint32 right = left - side;
  8769. pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
  8770. pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
  8771. }
  8772. }
  8773. #endif
  8774. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  8775. {
  8776. drflac_uint64 i;
  8777. drflac_uint64 frameCount4 = frameCount >> 2;
  8778. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8779. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8780. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8781. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8782. float factor = 1 / 2147483648.0;
  8783. for (i = 0; i < frameCount4; ++i) {
  8784. drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
  8785. drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
  8786. drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
  8787. drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
  8788. drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
  8789. drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
  8790. drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
  8791. drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
  8792. drflac_uint32 right0 = left0 - side0;
  8793. drflac_uint32 right1 = left1 - side1;
  8794. drflac_uint32 right2 = left2 - side2;
  8795. drflac_uint32 right3 = left3 - side3;
  8796. pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
  8797. pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
  8798. pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
  8799. pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
  8800. pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
  8801. pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
  8802. pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
  8803. pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
  8804. }
  8805. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8806. drflac_uint32 left = pInputSamples0U32[i] << shift0;
  8807. drflac_uint32 side = pInputSamples1U32[i] << shift1;
  8808. drflac_uint32 right = left - side;
  8809. pOutputSamples[i*2+0] = (drflac_int32)left * factor;
  8810. pOutputSamples[i*2+1] = (drflac_int32)right * factor;
  8811. }
  8812. }
  8813. #if defined(DRFLAC_SUPPORT_SSE2)
  8814. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  8815. {
  8816. drflac_uint64 i;
  8817. drflac_uint64 frameCount4 = frameCount >> 2;
  8818. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8819. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8820. drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  8821. drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  8822. __m128 factor;
  8823. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  8824. factor = _mm_set1_ps(1.0f / 8388608.0f);
  8825. for (i = 0; i < frameCount4; ++i) {
  8826. __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  8827. __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  8828. __m128i right = _mm_sub_epi32(left, side);
  8829. __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
  8830. __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
  8831. _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
  8832. _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
  8833. }
  8834. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8835. drflac_uint32 left = pInputSamples0U32[i] << shift0;
  8836. drflac_uint32 side = pInputSamples1U32[i] << shift1;
  8837. drflac_uint32 right = left - side;
  8838. pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
  8839. pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
  8840. }
  8841. }
  8842. #endif
  8843. #if defined(DRFLAC_SUPPORT_NEON)
  8844. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  8845. {
  8846. drflac_uint64 i;
  8847. drflac_uint64 frameCount4 = frameCount >> 2;
  8848. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8849. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8850. drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  8851. drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  8852. float32x4_t factor4;
  8853. int32x4_t shift0_4;
  8854. int32x4_t shift1_4;
  8855. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  8856. factor4 = vdupq_n_f32(1.0f / 8388608.0f);
  8857. shift0_4 = vdupq_n_s32(shift0);
  8858. shift1_4 = vdupq_n_s32(shift1);
  8859. for (i = 0; i < frameCount4; ++i) {
  8860. uint32x4_t left;
  8861. uint32x4_t side;
  8862. uint32x4_t right;
  8863. float32x4_t leftf;
  8864. float32x4_t rightf;
  8865. left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
  8866. side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
  8867. right = vsubq_u32(left, side);
  8868. leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
  8869. rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
  8870. drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
  8871. }
  8872. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8873. drflac_uint32 left = pInputSamples0U32[i] << shift0;
  8874. drflac_uint32 side = pInputSamples1U32[i] << shift1;
  8875. drflac_uint32 right = left - side;
  8876. pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
  8877. pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
  8878. }
  8879. }
  8880. #endif
  8881. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  8882. {
  8883. #if defined(DRFLAC_SUPPORT_SSE2)
  8884. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  8885. drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8886. } else
  8887. #elif defined(DRFLAC_SUPPORT_NEON)
  8888. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  8889. drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8890. } else
  8891. #endif
  8892. {
  8893. /* Scalar fallback. */
  8894. #if 0
  8895. drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8896. #else
  8897. drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  8898. #endif
  8899. }
  8900. }
  8901. #if 0
  8902. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  8903. {
  8904. drflac_uint64 i;
  8905. for (i = 0; i < frameCount; ++i) {
  8906. drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  8907. drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  8908. drflac_uint32 left = right + side;
  8909. pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
  8910. pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
  8911. }
  8912. }
  8913. #endif
  8914. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  8915. {
  8916. drflac_uint64 i;
  8917. drflac_uint64 frameCount4 = frameCount >> 2;
  8918. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8919. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8920. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  8921. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  8922. float factor = 1 / 2147483648.0;
  8923. for (i = 0; i < frameCount4; ++i) {
  8924. drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
  8925. drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
  8926. drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
  8927. drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
  8928. drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
  8929. drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
  8930. drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
  8931. drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
  8932. drflac_uint32 left0 = right0 + side0;
  8933. drflac_uint32 left1 = right1 + side1;
  8934. drflac_uint32 left2 = right2 + side2;
  8935. drflac_uint32 left3 = right3 + side3;
  8936. pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
  8937. pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
  8938. pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
  8939. pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
  8940. pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
  8941. pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
  8942. pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
  8943. pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
  8944. }
  8945. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8946. drflac_uint32 side = pInputSamples0U32[i] << shift0;
  8947. drflac_uint32 right = pInputSamples1U32[i] << shift1;
  8948. drflac_uint32 left = right + side;
  8949. pOutputSamples[i*2+0] = (drflac_int32)left * factor;
  8950. pOutputSamples[i*2+1] = (drflac_int32)right * factor;
  8951. }
  8952. }
  8953. #if defined(DRFLAC_SUPPORT_SSE2)
  8954. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  8955. {
  8956. drflac_uint64 i;
  8957. drflac_uint64 frameCount4 = frameCount >> 2;
  8958. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8959. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8960. drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  8961. drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  8962. __m128 factor;
  8963. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  8964. factor = _mm_set1_ps(1.0f / 8388608.0f);
  8965. for (i = 0; i < frameCount4; ++i) {
  8966. __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  8967. __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  8968. __m128i left = _mm_add_epi32(right, side);
  8969. __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
  8970. __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
  8971. _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
  8972. _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
  8973. }
  8974. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  8975. drflac_uint32 side = pInputSamples0U32[i] << shift0;
  8976. drflac_uint32 right = pInputSamples1U32[i] << shift1;
  8977. drflac_uint32 left = right + side;
  8978. pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
  8979. pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
  8980. }
  8981. }
  8982. #endif
  8983. #if defined(DRFLAC_SUPPORT_NEON)
  8984. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  8985. {
  8986. drflac_uint64 i;
  8987. drflac_uint64 frameCount4 = frameCount >> 2;
  8988. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  8989. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  8990. drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  8991. drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  8992. float32x4_t factor4;
  8993. int32x4_t shift0_4;
  8994. int32x4_t shift1_4;
  8995. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  8996. factor4 = vdupq_n_f32(1.0f / 8388608.0f);
  8997. shift0_4 = vdupq_n_s32(shift0);
  8998. shift1_4 = vdupq_n_s32(shift1);
  8999. for (i = 0; i < frameCount4; ++i) {
  9000. uint32x4_t side;
  9001. uint32x4_t right;
  9002. uint32x4_t left;
  9003. float32x4_t leftf;
  9004. float32x4_t rightf;
  9005. side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
  9006. right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
  9007. left = vaddq_u32(right, side);
  9008. leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
  9009. rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
  9010. drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
  9011. }
  9012. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  9013. drflac_uint32 side = pInputSamples0U32[i] << shift0;
  9014. drflac_uint32 right = pInputSamples1U32[i] << shift1;
  9015. drflac_uint32 left = right + side;
  9016. pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
  9017. pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
  9018. }
  9019. }
  9020. #endif
  9021. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9022. {
  9023. #if defined(DRFLAC_SUPPORT_SSE2)
  9024. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  9025. drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9026. } else
  9027. #elif defined(DRFLAC_SUPPORT_NEON)
  9028. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  9029. drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9030. } else
  9031. #endif
  9032. {
  9033. /* Scalar fallback. */
  9034. #if 0
  9035. drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9036. #else
  9037. drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9038. #endif
  9039. }
  9040. }
  9041. #if 0
  9042. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9043. {
  9044. for (drflac_uint64 i = 0; i < frameCount; ++i) {
  9045. drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9046. drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9047. mid = (mid << 1) | (side & 0x01);
  9048. pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
  9049. pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
  9050. }
  9051. }
  9052. #endif
  9053. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9054. {
  9055. drflac_uint64 i;
  9056. drflac_uint64 frameCount4 = frameCount >> 2;
  9057. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  9058. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  9059. drflac_uint32 shift = unusedBitsPerSample;
  9060. float factor = 1 / 2147483648.0;
  9061. if (shift > 0) {
  9062. shift -= 1;
  9063. for (i = 0; i < frameCount4; ++i) {
  9064. drflac_uint32 temp0L;
  9065. drflac_uint32 temp1L;
  9066. drflac_uint32 temp2L;
  9067. drflac_uint32 temp3L;
  9068. drflac_uint32 temp0R;
  9069. drflac_uint32 temp1R;
  9070. drflac_uint32 temp2R;
  9071. drflac_uint32 temp3R;
  9072. drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9073. drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9074. drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9075. drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9076. drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9077. drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9078. drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9079. drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9080. mid0 = (mid0 << 1) | (side0 & 0x01);
  9081. mid1 = (mid1 << 1) | (side1 & 0x01);
  9082. mid2 = (mid2 << 1) | (side2 & 0x01);
  9083. mid3 = (mid3 << 1) | (side3 & 0x01);
  9084. temp0L = (mid0 + side0) << shift;
  9085. temp1L = (mid1 + side1) << shift;
  9086. temp2L = (mid2 + side2) << shift;
  9087. temp3L = (mid3 + side3) << shift;
  9088. temp0R = (mid0 - side0) << shift;
  9089. temp1R = (mid1 - side1) << shift;
  9090. temp2R = (mid2 - side2) << shift;
  9091. temp3R = (mid3 - side3) << shift;
  9092. pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
  9093. pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
  9094. pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
  9095. pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
  9096. pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
  9097. pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
  9098. pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
  9099. pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
  9100. }
  9101. } else {
  9102. for (i = 0; i < frameCount4; ++i) {
  9103. drflac_uint32 temp0L;
  9104. drflac_uint32 temp1L;
  9105. drflac_uint32 temp2L;
  9106. drflac_uint32 temp3L;
  9107. drflac_uint32 temp0R;
  9108. drflac_uint32 temp1R;
  9109. drflac_uint32 temp2R;
  9110. drflac_uint32 temp3R;
  9111. drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9112. drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9113. drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9114. drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9115. drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9116. drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9117. drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9118. drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9119. mid0 = (mid0 << 1) | (side0 & 0x01);
  9120. mid1 = (mid1 << 1) | (side1 & 0x01);
  9121. mid2 = (mid2 << 1) | (side2 & 0x01);
  9122. mid3 = (mid3 << 1) | (side3 & 0x01);
  9123. temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
  9124. temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
  9125. temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
  9126. temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
  9127. temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
  9128. temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
  9129. temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
  9130. temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
  9131. pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
  9132. pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
  9133. pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
  9134. pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
  9135. pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
  9136. pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
  9137. pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
  9138. pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
  9139. }
  9140. }
  9141. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  9142. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9143. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9144. mid = (mid << 1) | (side & 0x01);
  9145. pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
  9146. pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
  9147. }
  9148. }
  9149. #if defined(DRFLAC_SUPPORT_SSE2)
  9150. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9151. {
  9152. drflac_uint64 i;
  9153. drflac_uint64 frameCount4 = frameCount >> 2;
  9154. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  9155. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  9156. drflac_uint32 shift = unusedBitsPerSample - 8;
  9157. float factor;
  9158. __m128 factor128;
  9159. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  9160. factor = 1.0f / 8388608.0f;
  9161. factor128 = _mm_set1_ps(factor);
  9162. if (shift == 0) {
  9163. for (i = 0; i < frameCount4; ++i) {
  9164. __m128i mid;
  9165. __m128i side;
  9166. __m128i tempL;
  9167. __m128i tempR;
  9168. __m128 leftf;
  9169. __m128 rightf;
  9170. mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  9171. side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  9172. mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
  9173. tempL = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
  9174. tempR = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
  9175. leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
  9176. rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
  9177. _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
  9178. _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
  9179. }
  9180. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  9181. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9182. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9183. mid = (mid << 1) | (side & 0x01);
  9184. pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
  9185. pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
  9186. }
  9187. } else {
  9188. shift -= 1;
  9189. for (i = 0; i < frameCount4; ++i) {
  9190. __m128i mid;
  9191. __m128i side;
  9192. __m128i tempL;
  9193. __m128i tempR;
  9194. __m128 leftf;
  9195. __m128 rightf;
  9196. mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  9197. side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  9198. mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
  9199. tempL = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
  9200. tempR = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
  9201. leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
  9202. rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
  9203. _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
  9204. _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
  9205. }
  9206. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  9207. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9208. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9209. mid = (mid << 1) | (side & 0x01);
  9210. pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
  9211. pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
  9212. }
  9213. }
  9214. }
  9215. #endif
  9216. #if defined(DRFLAC_SUPPORT_NEON)
  9217. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9218. {
  9219. drflac_uint64 i;
  9220. drflac_uint64 frameCount4 = frameCount >> 2;
  9221. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  9222. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  9223. drflac_uint32 shift = unusedBitsPerSample - 8;
  9224. float factor;
  9225. float32x4_t factor4;
  9226. int32x4_t shift4;
  9227. int32x4_t wbps0_4; /* Wasted Bits Per Sample */
  9228. int32x4_t wbps1_4; /* Wasted Bits Per Sample */
  9229. DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
  9230. factor = 1.0f / 8388608.0f;
  9231. factor4 = vdupq_n_f32(factor);
  9232. wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
  9233. wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
  9234. if (shift == 0) {
  9235. for (i = 0; i < frameCount4; ++i) {
  9236. int32x4_t lefti;
  9237. int32x4_t righti;
  9238. float32x4_t leftf;
  9239. float32x4_t rightf;
  9240. uint32x4_t mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
  9241. uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
  9242. mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
  9243. lefti = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
  9244. righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
  9245. leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
  9246. rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
  9247. drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
  9248. }
  9249. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  9250. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9251. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9252. mid = (mid << 1) | (side & 0x01);
  9253. pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
  9254. pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
  9255. }
  9256. } else {
  9257. shift -= 1;
  9258. shift4 = vdupq_n_s32(shift);
  9259. for (i = 0; i < frameCount4; ++i) {
  9260. uint32x4_t mid;
  9261. uint32x4_t side;
  9262. int32x4_t lefti;
  9263. int32x4_t righti;
  9264. float32x4_t leftf;
  9265. float32x4_t rightf;
  9266. mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
  9267. side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
  9268. mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
  9269. lefti = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
  9270. righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
  9271. leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
  9272. rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
  9273. drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
  9274. }
  9275. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  9276. drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9277. drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9278. mid = (mid << 1) | (side & 0x01);
  9279. pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
  9280. pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
  9281. }
  9282. }
  9283. }
  9284. #endif
  9285. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9286. {
  9287. #if defined(DRFLAC_SUPPORT_SSE2)
  9288. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  9289. drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9290. } else
  9291. #elif defined(DRFLAC_SUPPORT_NEON)
  9292. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  9293. drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9294. } else
  9295. #endif
  9296. {
  9297. /* Scalar fallback. */
  9298. #if 0
  9299. drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9300. #else
  9301. drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9302. #endif
  9303. }
  9304. }
  9305. #if 0
  9306. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9307. {
  9308. for (drflac_uint64 i = 0; i < frameCount; ++i) {
  9309. pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
  9310. pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
  9311. }
  9312. }
  9313. #endif
  9314. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9315. {
  9316. drflac_uint64 i;
  9317. drflac_uint64 frameCount4 = frameCount >> 2;
  9318. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  9319. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  9320. drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
  9321. drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
  9322. float factor = 1 / 2147483648.0;
  9323. for (i = 0; i < frameCount4; ++i) {
  9324. drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
  9325. drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
  9326. drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
  9327. drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
  9328. drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
  9329. drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
  9330. drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
  9331. drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
  9332. pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
  9333. pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
  9334. pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
  9335. pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
  9336. pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
  9337. pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
  9338. pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
  9339. pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
  9340. }
  9341. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  9342. pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
  9343. pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
  9344. }
  9345. }
  9346. #if defined(DRFLAC_SUPPORT_SSE2)
  9347. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9348. {
  9349. drflac_uint64 i;
  9350. drflac_uint64 frameCount4 = frameCount >> 2;
  9351. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  9352. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  9353. drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  9354. drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  9355. float factor = 1.0f / 8388608.0f;
  9356. __m128 factor128 = _mm_set1_ps(factor);
  9357. for (i = 0; i < frameCount4; ++i) {
  9358. __m128i lefti;
  9359. __m128i righti;
  9360. __m128 leftf;
  9361. __m128 rightf;
  9362. lefti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
  9363. righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
  9364. leftf = _mm_mul_ps(_mm_cvtepi32_ps(lefti), factor128);
  9365. rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
  9366. _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
  9367. _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
  9368. }
  9369. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  9370. pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
  9371. pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
  9372. }
  9373. }
  9374. #endif
  9375. #if defined(DRFLAC_SUPPORT_NEON)
  9376. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9377. {
  9378. drflac_uint64 i;
  9379. drflac_uint64 frameCount4 = frameCount >> 2;
  9380. const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
  9381. const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
  9382. drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
  9383. drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
  9384. float factor = 1.0f / 8388608.0f;
  9385. float32x4_t factor4 = vdupq_n_f32(factor);
  9386. int32x4_t shift0_4 = vdupq_n_s32(shift0);
  9387. int32x4_t shift1_4 = vdupq_n_s32(shift1);
  9388. for (i = 0; i < frameCount4; ++i) {
  9389. int32x4_t lefti;
  9390. int32x4_t righti;
  9391. float32x4_t leftf;
  9392. float32x4_t rightf;
  9393. lefti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
  9394. righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
  9395. leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
  9396. rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
  9397. drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
  9398. }
  9399. for (i = (frameCount4 << 2); i < frameCount; ++i) {
  9400. pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
  9401. pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
  9402. }
  9403. }
  9404. #endif
  9405. static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
  9406. {
  9407. #if defined(DRFLAC_SUPPORT_SSE2)
  9408. if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
  9409. drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9410. } else
  9411. #elif defined(DRFLAC_SUPPORT_NEON)
  9412. if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
  9413. drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9414. } else
  9415. #endif
  9416. {
  9417. /* Scalar fallback. */
  9418. #if 0
  9419. drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9420. #else
  9421. drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
  9422. #endif
  9423. }
  9424. }
  9425. DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
  9426. {
  9427. drflac_uint64 framesRead;
  9428. drflac_uint32 unusedBitsPerSample;
  9429. if (pFlac == NULL || framesToRead == 0) {
  9430. return 0;
  9431. }
  9432. if (pBufferOut == NULL) {
  9433. return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
  9434. }
  9435. DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
  9436. unusedBitsPerSample = 32 - pFlac->bitsPerSample;
  9437. framesRead = 0;
  9438. while (framesToRead > 0) {
  9439. /* If we've run out of samples in this frame, go to the next. */
  9440. if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
  9441. if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
  9442. break; /* Couldn't read the next frame, so just break from the loop and return. */
  9443. }
  9444. } else {
  9445. unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
  9446. drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
  9447. drflac_uint64 frameCountThisIteration = framesToRead;
  9448. if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
  9449. frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
  9450. }
  9451. if (channelCount == 2) {
  9452. const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
  9453. const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
  9454. switch (pFlac->currentFLACFrame.header.channelAssignment)
  9455. {
  9456. case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
  9457. {
  9458. drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  9459. } break;
  9460. case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
  9461. {
  9462. drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  9463. } break;
  9464. case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
  9465. {
  9466. drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  9467. } break;
  9468. case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
  9469. default:
  9470. {
  9471. drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
  9472. } break;
  9473. }
  9474. } else {
  9475. /* Generic interleaving. */
  9476. drflac_uint64 i;
  9477. for (i = 0; i < frameCountThisIteration; ++i) {
  9478. unsigned int j;
  9479. for (j = 0; j < channelCount; ++j) {
  9480. drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
  9481. pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
  9482. }
  9483. }
  9484. }
  9485. framesRead += frameCountThisIteration;
  9486. pBufferOut += frameCountThisIteration * channelCount;
  9487. framesToRead -= frameCountThisIteration;
  9488. pFlac->currentPCMFrame += frameCountThisIteration;
  9489. pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
  9490. }
  9491. }
  9492. return framesRead;
  9493. }
  9494. DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
  9495. {
  9496. if (pFlac == NULL) {
  9497. return DRFLAC_FALSE;
  9498. }
  9499. /* Don't do anything if we're already on the seek point. */
  9500. if (pFlac->currentPCMFrame == pcmFrameIndex) {
  9501. return DRFLAC_TRUE;
  9502. }
  9503. /*
  9504. If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
  9505. when the decoder was opened.
  9506. */
  9507. if (pFlac->firstFLACFramePosInBytes == 0) {
  9508. return DRFLAC_FALSE;
  9509. }
  9510. if (pcmFrameIndex == 0) {
  9511. pFlac->currentPCMFrame = 0;
  9512. return drflac__seek_to_first_frame(pFlac);
  9513. } else {
  9514. drflac_bool32 wasSuccessful = DRFLAC_FALSE;
  9515. drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
  9516. /* Clamp the sample to the end. */
  9517. if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
  9518. pcmFrameIndex = pFlac->totalPCMFrameCount;
  9519. }
  9520. /* If the target sample and the current sample are in the same frame we just move the position forward. */
  9521. if (pcmFrameIndex > pFlac->currentPCMFrame) {
  9522. /* Forward. */
  9523. drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
  9524. if (pFlac->currentFLACFrame.pcmFramesRemaining > offset) {
  9525. pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
  9526. pFlac->currentPCMFrame = pcmFrameIndex;
  9527. return DRFLAC_TRUE;
  9528. }
  9529. } else {
  9530. /* Backward. */
  9531. drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
  9532. drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
  9533. drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
  9534. if (currentFLACFramePCMFramesConsumed > offsetAbs) {
  9535. pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
  9536. pFlac->currentPCMFrame = pcmFrameIndex;
  9537. return DRFLAC_TRUE;
  9538. }
  9539. }
  9540. /*
  9541. Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
  9542. we'll instead use Ogg's natural seeking facility.
  9543. */
  9544. #ifndef DR_FLAC_NO_OGG
  9545. if (pFlac->container == drflac_container_ogg)
  9546. {
  9547. wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
  9548. }
  9549. else
  9550. #endif
  9551. {
  9552. /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
  9553. if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
  9554. wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
  9555. }
  9556. #if !defined(DR_FLAC_NO_CRC)
  9557. /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
  9558. if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
  9559. wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
  9560. }
  9561. #endif
  9562. /* Fall back to brute force if all else fails. */
  9563. if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
  9564. wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
  9565. }
  9566. }
  9567. if (wasSuccessful) {
  9568. pFlac->currentPCMFrame = pcmFrameIndex;
  9569. } else {
  9570. /* Seek failed. Try putting the decoder back to it's original state. */
  9571. if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
  9572. /* Failed to seek back to the original PCM frame. Fall back to 0. */
  9573. drflac_seek_to_pcm_frame(pFlac, 0);
  9574. }
  9575. }
  9576. return wasSuccessful;
  9577. }
  9578. }
  9579. /* High Level APIs */
  9580. #if defined(SIZE_MAX)
  9581. #define DRFLAC_SIZE_MAX SIZE_MAX
  9582. #else
  9583. #if defined(DRFLAC_64BIT)
  9584. #define DRFLAC_SIZE_MAX ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
  9585. #else
  9586. #define DRFLAC_SIZE_MAX 0xFFFFFFFF
  9587. #endif
  9588. #endif
  9589. /* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
  9590. #define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
  9591. static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
  9592. { \
  9593. type* pSampleData = NULL; \
  9594. drflac_uint64 totalPCMFrameCount; \
  9595. \
  9596. DRFLAC_ASSERT(pFlac != NULL); \
  9597. \
  9598. totalPCMFrameCount = pFlac->totalPCMFrameCount; \
  9599. \
  9600. if (totalPCMFrameCount == 0) { \
  9601. type buffer[4096]; \
  9602. drflac_uint64 pcmFramesRead; \
  9603. size_t sampleDataBufferSize = sizeof(buffer); \
  9604. \
  9605. pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks); \
  9606. if (pSampleData == NULL) { \
  9607. goto on_error; \
  9608. } \
  9609. \
  9610. while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) { \
  9611. if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) { \
  9612. type* pNewSampleData; \
  9613. size_t newSampleDataBufferSize; \
  9614. \
  9615. newSampleDataBufferSize = sampleDataBufferSize * 2; \
  9616. pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks); \
  9617. if (pNewSampleData == NULL) { \
  9618. drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks); \
  9619. goto on_error; \
  9620. } \
  9621. \
  9622. sampleDataBufferSize = newSampleDataBufferSize; \
  9623. pSampleData = pNewSampleData; \
  9624. } \
  9625. \
  9626. DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type))); \
  9627. totalPCMFrameCount += pcmFramesRead; \
  9628. } \
  9629. \
  9630. /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to \
  9631. protect those ears from random noise! */ \
  9632. DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type))); \
  9633. } else { \
  9634. drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type); \
  9635. if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) { \
  9636. goto on_error; /* The decoded data is too big. */ \
  9637. } \
  9638. \
  9639. pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks); /* <-- Safe cast as per the check above. */ \
  9640. if (pSampleData == NULL) { \
  9641. goto on_error; \
  9642. } \
  9643. \
  9644. totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData); \
  9645. } \
  9646. \
  9647. if (sampleRateOut) *sampleRateOut = pFlac->sampleRate; \
  9648. if (channelsOut) *channelsOut = pFlac->channels; \
  9649. if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount; \
  9650. \
  9651. drflac_close(pFlac); \
  9652. return pSampleData; \
  9653. \
  9654. on_error: \
  9655. drflac_close(pFlac); \
  9656. return NULL; \
  9657. }
  9658. DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
  9659. DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
  9660. DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
  9661. DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
  9662. {
  9663. drflac* pFlac;
  9664. if (channelsOut) {
  9665. *channelsOut = 0;
  9666. }
  9667. if (sampleRateOut) {
  9668. *sampleRateOut = 0;
  9669. }
  9670. if (totalPCMFrameCountOut) {
  9671. *totalPCMFrameCountOut = 0;
  9672. }
  9673. pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
  9674. if (pFlac == NULL) {
  9675. return NULL;
  9676. }
  9677. return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
  9678. }
  9679. DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
  9680. {
  9681. drflac* pFlac;
  9682. if (channelsOut) {
  9683. *channelsOut = 0;
  9684. }
  9685. if (sampleRateOut) {
  9686. *sampleRateOut = 0;
  9687. }
  9688. if (totalPCMFrameCountOut) {
  9689. *totalPCMFrameCountOut = 0;
  9690. }
  9691. pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
  9692. if (pFlac == NULL) {
  9693. return NULL;
  9694. }
  9695. return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
  9696. }
  9697. DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
  9698. {
  9699. drflac* pFlac;
  9700. if (channelsOut) {
  9701. *channelsOut = 0;
  9702. }
  9703. if (sampleRateOut) {
  9704. *sampleRateOut = 0;
  9705. }
  9706. if (totalPCMFrameCountOut) {
  9707. *totalPCMFrameCountOut = 0;
  9708. }
  9709. pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
  9710. if (pFlac == NULL) {
  9711. return NULL;
  9712. }
  9713. return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
  9714. }
  9715. #ifndef DR_FLAC_NO_STDIO
  9716. DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  9717. {
  9718. drflac* pFlac;
  9719. if (sampleRate) {
  9720. *sampleRate = 0;
  9721. }
  9722. if (channels) {
  9723. *channels = 0;
  9724. }
  9725. if (totalPCMFrameCount) {
  9726. *totalPCMFrameCount = 0;
  9727. }
  9728. pFlac = drflac_open_file(filename, pAllocationCallbacks);
  9729. if (pFlac == NULL) {
  9730. return NULL;
  9731. }
  9732. return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
  9733. }
  9734. DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  9735. {
  9736. drflac* pFlac;
  9737. if (sampleRate) {
  9738. *sampleRate = 0;
  9739. }
  9740. if (channels) {
  9741. *channels = 0;
  9742. }
  9743. if (totalPCMFrameCount) {
  9744. *totalPCMFrameCount = 0;
  9745. }
  9746. pFlac = drflac_open_file(filename, pAllocationCallbacks);
  9747. if (pFlac == NULL) {
  9748. return NULL;
  9749. }
  9750. return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
  9751. }
  9752. DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  9753. {
  9754. drflac* pFlac;
  9755. if (sampleRate) {
  9756. *sampleRate = 0;
  9757. }
  9758. if (channels) {
  9759. *channels = 0;
  9760. }
  9761. if (totalPCMFrameCount) {
  9762. *totalPCMFrameCount = 0;
  9763. }
  9764. pFlac = drflac_open_file(filename, pAllocationCallbacks);
  9765. if (pFlac == NULL) {
  9766. return NULL;
  9767. }
  9768. return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
  9769. }
  9770. #endif
  9771. DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  9772. {
  9773. drflac* pFlac;
  9774. if (sampleRate) {
  9775. *sampleRate = 0;
  9776. }
  9777. if (channels) {
  9778. *channels = 0;
  9779. }
  9780. if (totalPCMFrameCount) {
  9781. *totalPCMFrameCount = 0;
  9782. }
  9783. pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
  9784. if (pFlac == NULL) {
  9785. return NULL;
  9786. }
  9787. return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
  9788. }
  9789. DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  9790. {
  9791. drflac* pFlac;
  9792. if (sampleRate) {
  9793. *sampleRate = 0;
  9794. }
  9795. if (channels) {
  9796. *channels = 0;
  9797. }
  9798. if (totalPCMFrameCount) {
  9799. *totalPCMFrameCount = 0;
  9800. }
  9801. pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
  9802. if (pFlac == NULL) {
  9803. return NULL;
  9804. }
  9805. return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
  9806. }
  9807. DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
  9808. {
  9809. drflac* pFlac;
  9810. if (sampleRate) {
  9811. *sampleRate = 0;
  9812. }
  9813. if (channels) {
  9814. *channels = 0;
  9815. }
  9816. if (totalPCMFrameCount) {
  9817. *totalPCMFrameCount = 0;
  9818. }
  9819. pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
  9820. if (pFlac == NULL) {
  9821. return NULL;
  9822. }
  9823. return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
  9824. }
  9825. DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
  9826. {
  9827. if (pAllocationCallbacks != NULL) {
  9828. drflac__free_from_callbacks(p, pAllocationCallbacks);
  9829. } else {
  9830. drflac__free_default(p, NULL);
  9831. }
  9832. }
  9833. DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
  9834. {
  9835. if (pIter == NULL) {
  9836. return;
  9837. }
  9838. pIter->countRemaining = commentCount;
  9839. pIter->pRunningData = (const char*)pComments;
  9840. }
  9841. DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
  9842. {
  9843. drflac_int32 length;
  9844. const char* pComment;
  9845. /* Safety. */
  9846. if (pCommentLengthOut) {
  9847. *pCommentLengthOut = 0;
  9848. }
  9849. if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
  9850. return NULL;
  9851. }
  9852. length = drflac__le2host_32(*(const drflac_uint32*)pIter->pRunningData);
  9853. pIter->pRunningData += 4;
  9854. pComment = pIter->pRunningData;
  9855. pIter->pRunningData += length;
  9856. pIter->countRemaining -= 1;
  9857. if (pCommentLengthOut) {
  9858. *pCommentLengthOut = length;
  9859. }
  9860. return pComment;
  9861. }
  9862. DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
  9863. {
  9864. if (pIter == NULL) {
  9865. return;
  9866. }
  9867. pIter->countRemaining = trackCount;
  9868. pIter->pRunningData = (const char*)pTrackData;
  9869. }
  9870. DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
  9871. {
  9872. drflac_cuesheet_track cuesheetTrack;
  9873. const char* pRunningData;
  9874. drflac_uint64 offsetHi;
  9875. drflac_uint64 offsetLo;
  9876. if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
  9877. return DRFLAC_FALSE;
  9878. }
  9879. pRunningData = pIter->pRunningData;
  9880. offsetHi = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  9881. offsetLo = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
  9882. cuesheetTrack.offset = offsetLo | (offsetHi << 32);
  9883. cuesheetTrack.trackNumber = pRunningData[0]; pRunningData += 1;
  9884. DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC)); pRunningData += 12;
  9885. cuesheetTrack.isAudio = (pRunningData[0] & 0x80) != 0;
  9886. cuesheetTrack.preEmphasis = (pRunningData[0] & 0x40) != 0; pRunningData += 14;
  9887. cuesheetTrack.indexCount = pRunningData[0]; pRunningData += 1;
  9888. cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData; pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
  9889. pIter->pRunningData = pRunningData;
  9890. pIter->countRemaining -= 1;
  9891. if (pCuesheetTrack) {
  9892. *pCuesheetTrack = cuesheetTrack;
  9893. }
  9894. return DRFLAC_TRUE;
  9895. }
  9896. #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
  9897. #pragma GCC diagnostic pop
  9898. #endif
  9899. #endif /* dr_flac_c */
  9900. #endif /* DR_FLAC_IMPLEMENTATION */
  9901. /*
  9902. REVISION HISTORY
  9903. ================
  9904. v0.12.31 - 2021-08-16
  9905. - Silence some warnings.
  9906. v0.12.30 - 2021-07-31
  9907. - Fix platform detection for ARM64.
  9908. v0.12.29 - 2021-04-02
  9909. - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
  9910. - Fix a decoding error due to an incorrect validation check.
  9911. v0.12.28 - 2021-02-21
  9912. - Fix a warning due to referencing _MSC_VER when it is undefined.
  9913. v0.12.27 - 2021-01-31
  9914. - Fix a static analysis warning.
  9915. v0.12.26 - 2021-01-17
  9916. - Fix a compilation warning due to _BSD_SOURCE being deprecated.
  9917. v0.12.25 - 2020-12-26
  9918. - Update documentation.
  9919. v0.12.24 - 2020-11-29
  9920. - Fix ARM64/NEON detection when compiling with MSVC.
  9921. v0.12.23 - 2020-11-21
  9922. - Fix compilation with OpenWatcom.
  9923. v0.12.22 - 2020-11-01
  9924. - Fix an error with the previous release.
  9925. v0.12.21 - 2020-11-01
  9926. - Fix a possible deadlock when seeking.
  9927. - Improve compiler support for older versions of GCC.
  9928. v0.12.20 - 2020-09-08
  9929. - Fix a compilation error on older compilers.
  9930. v0.12.19 - 2020-08-30
  9931. - Fix a bug due to an undefined 32-bit shift.
  9932. v0.12.18 - 2020-08-14
  9933. - Fix a crash when compiling with clang-cl.
  9934. v0.12.17 - 2020-08-02
  9935. - Simplify sized types.
  9936. v0.12.16 - 2020-07-25
  9937. - Fix a compilation warning.
  9938. v0.12.15 - 2020-07-06
  9939. - Check for negative LPC shifts and return an error.
  9940. v0.12.14 - 2020-06-23
  9941. - Add include guard for the implementation section.
  9942. v0.12.13 - 2020-05-16
  9943. - Add compile-time and run-time version querying.
  9944. - DRFLAC_VERSION_MINOR
  9945. - DRFLAC_VERSION_MAJOR
  9946. - DRFLAC_VERSION_REVISION
  9947. - DRFLAC_VERSION_STRING
  9948. - drflac_version()
  9949. - drflac_version_string()
  9950. v0.12.12 - 2020-04-30
  9951. - Fix compilation errors with VC6.
  9952. v0.12.11 - 2020-04-19
  9953. - Fix some pedantic warnings.
  9954. - Fix some undefined behaviour warnings.
  9955. v0.12.10 - 2020-04-10
  9956. - Fix some bugs when trying to seek with an invalid seek table.
  9957. v0.12.9 - 2020-04-05
  9958. - Fix warnings.
  9959. v0.12.8 - 2020-04-04
  9960. - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
  9961. - Fix some static analysis warnings.
  9962. - Minor documentation updates.
  9963. v0.12.7 - 2020-03-14
  9964. - Fix compilation errors with VC6.
  9965. v0.12.6 - 2020-03-07
  9966. - Fix compilation error with Visual Studio .NET 2003.
  9967. v0.12.5 - 2020-01-30
  9968. - Silence some static analysis warnings.
  9969. v0.12.4 - 2020-01-29
  9970. - Silence some static analysis warnings.
  9971. v0.12.3 - 2019-12-02
  9972. - Fix some warnings when compiling with GCC and the -Og flag.
  9973. - Fix a crash in out-of-memory situations.
  9974. - Fix potential integer overflow bug.
  9975. - Fix some static analysis warnings.
  9976. - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
  9977. - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
  9978. v0.12.2 - 2019-10-07
  9979. - Internal code clean up.
  9980. v0.12.1 - 2019-09-29
  9981. - Fix some Clang Static Analyzer warnings.
  9982. - Fix an unused variable warning.
  9983. v0.12.0 - 2019-09-23
  9984. - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
  9985. routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
  9986. - drflac_open()
  9987. - drflac_open_relaxed()
  9988. - drflac_open_with_metadata()
  9989. - drflac_open_with_metadata_relaxed()
  9990. - drflac_open_file()
  9991. - drflac_open_file_with_metadata()
  9992. - drflac_open_memory()
  9993. - drflac_open_memory_with_metadata()
  9994. - drflac_open_and_read_pcm_frames_s32()
  9995. - drflac_open_and_read_pcm_frames_s16()
  9996. - drflac_open_and_read_pcm_frames_f32()
  9997. - drflac_open_file_and_read_pcm_frames_s32()
  9998. - drflac_open_file_and_read_pcm_frames_s16()
  9999. - drflac_open_file_and_read_pcm_frames_f32()
  10000. - drflac_open_memory_and_read_pcm_frames_s32()
  10001. - drflac_open_memory_and_read_pcm_frames_s16()
  10002. - drflac_open_memory_and_read_pcm_frames_f32()
  10003. Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
  10004. DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
  10005. - Remove deprecated APIs:
  10006. - drflac_read_s32()
  10007. - drflac_read_s16()
  10008. - drflac_read_f32()
  10009. - drflac_seek_to_sample()
  10010. - drflac_open_and_decode_s32()
  10011. - drflac_open_and_decode_s16()
  10012. - drflac_open_and_decode_f32()
  10013. - drflac_open_and_decode_file_s32()
  10014. - drflac_open_and_decode_file_s16()
  10015. - drflac_open_and_decode_file_f32()
  10016. - drflac_open_and_decode_memory_s32()
  10017. - drflac_open_and_decode_memory_s16()
  10018. - drflac_open_and_decode_memory_f32()
  10019. - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
  10020. by doing pFlac->totalPCMFrameCount*pFlac->channels.
  10021. - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
  10022. - Fix errors when seeking to the end of a stream.
  10023. - Optimizations to seeking.
  10024. - SSE improvements and optimizations.
  10025. - ARM NEON optimizations.
  10026. - Optimizations to drflac_read_pcm_frames_s16().
  10027. - Optimizations to drflac_read_pcm_frames_s32().
  10028. v0.11.10 - 2019-06-26
  10029. - Fix a compiler error.
  10030. v0.11.9 - 2019-06-16
  10031. - Silence some ThreadSanitizer warnings.
  10032. v0.11.8 - 2019-05-21
  10033. - Fix warnings.
  10034. v0.11.7 - 2019-05-06
  10035. - C89 fixes.
  10036. v0.11.6 - 2019-05-05
  10037. - Add support for C89.
  10038. - Fix a compiler warning when CRC is disabled.
  10039. - Change license to choice of public domain or MIT-0.
  10040. v0.11.5 - 2019-04-19
  10041. - Fix a compiler error with GCC.
  10042. v0.11.4 - 2019-04-17
  10043. - Fix some warnings with GCC when compiling with -std=c99.
  10044. v0.11.3 - 2019-04-07
  10045. - Silence warnings with GCC.
  10046. v0.11.2 - 2019-03-10
  10047. - Fix a warning.
  10048. v0.11.1 - 2019-02-17
  10049. - Fix a potential bug with seeking.
  10050. v0.11.0 - 2018-12-16
  10051. - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
  10052. drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
  10053. and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
  10054. dividing it by the channel count, and then do the same with the return value.
  10055. - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
  10056. the changes to drflac_read_*() apply.
  10057. - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
  10058. the changes to drflac_read_*() apply.
  10059. - Optimizations.
  10060. v0.10.0 - 2018-09-11
  10061. - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
  10062. need to do it yourself via the callback API.
  10063. - Fix the clang build.
  10064. - Fix undefined behavior.
  10065. - Fix errors with CUESHEET metdata blocks.
  10066. - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
  10067. Vorbis comment API.
  10068. - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
  10069. - Minor optimizations.
  10070. v0.9.11 - 2018-08-29
  10071. - Fix a bug with sample reconstruction.
  10072. v0.9.10 - 2018-08-07
  10073. - Improve 64-bit detection.
  10074. v0.9.9 - 2018-08-05
  10075. - Fix C++ build on older versions of GCC.
  10076. v0.9.8 - 2018-07-24
  10077. - Fix compilation errors.
  10078. v0.9.7 - 2018-07-05
  10079. - Fix a warning.
  10080. v0.9.6 - 2018-06-29
  10081. - Fix some typos.
  10082. v0.9.5 - 2018-06-23
  10083. - Fix some warnings.
  10084. v0.9.4 - 2018-06-14
  10085. - Optimizations to seeking.
  10086. - Clean up.
  10087. v0.9.3 - 2018-05-22
  10088. - Bug fix.
  10089. v0.9.2 - 2018-05-12
  10090. - Fix a compilation error due to a missing break statement.
  10091. v0.9.1 - 2018-04-29
  10092. - Fix compilation error with Clang.
  10093. v0.9 - 2018-04-24
  10094. - Fix Clang build.
  10095. - Start using major.minor.revision versioning.
  10096. v0.8g - 2018-04-19
  10097. - Fix build on non-x86/x64 architectures.
  10098. v0.8f - 2018-02-02
  10099. - Stop pretending to support changing rate/channels mid stream.
  10100. v0.8e - 2018-02-01
  10101. - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
  10102. - Fix a crash the the Rice partition order is invalid.
  10103. v0.8d - 2017-09-22
  10104. - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
  10105. v0.8c - 2017-09-07
  10106. - Fix warning on non-x86/x64 architectures.
  10107. v0.8b - 2017-08-19
  10108. - Fix build on non-x86/x64 architectures.
  10109. v0.8a - 2017-08-13
  10110. - A small optimization for the Clang build.
  10111. v0.8 - 2017-08-12
  10112. - API CHANGE: Rename dr_* types to drflac_*.
  10113. - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
  10114. - Add support for custom implementations of malloc(), realloc(), etc.
  10115. - Add CRC checking to Ogg encapsulated streams.
  10116. - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
  10117. - Bug fixes.
  10118. v0.7 - 2017-07-23
  10119. - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
  10120. v0.6 - 2017-07-22
  10121. - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
  10122. never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
  10123. v0.5 - 2017-07-16
  10124. - Fix typos.
  10125. - Change drflac_bool* types to unsigned.
  10126. - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
  10127. v0.4f - 2017-03-10
  10128. - Fix a couple of bugs with the bitstreaming code.
  10129. v0.4e - 2017-02-17
  10130. - Fix some warnings.
  10131. v0.4d - 2016-12-26
  10132. - Add support for 32-bit floating-point PCM decoding.
  10133. - Use drflac_int* and drflac_uint* sized types to improve compiler support.
  10134. - Minor improvements to documentation.
  10135. v0.4c - 2016-12-26
  10136. - Add support for signed 16-bit integer PCM decoding.
  10137. v0.4b - 2016-10-23
  10138. - A minor change to drflac_bool8 and drflac_bool32 types.
  10139. v0.4a - 2016-10-11
  10140. - Rename drBool32 to drflac_bool32 for styling consistency.
  10141. v0.4 - 2016-09-29
  10142. - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
  10143. - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
  10144. - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
  10145. keep it consistent with drflac_audio.
  10146. v0.3f - 2016-09-21
  10147. - Fix a warning with GCC.
  10148. v0.3e - 2016-09-18
  10149. - Fixed a bug where GCC 4.3+ was not getting properly identified.
  10150. - Fixed a few typos.
  10151. - Changed date formats to ISO 8601 (YYYY-MM-DD).
  10152. v0.3d - 2016-06-11
  10153. - Minor clean up.
  10154. v0.3c - 2016-05-28
  10155. - Fixed compilation error.
  10156. v0.3b - 2016-05-16
  10157. - Fixed Linux/GCC build.
  10158. - Updated documentation.
  10159. v0.3a - 2016-05-15
  10160. - Minor fixes to documentation.
  10161. v0.3 - 2016-05-11
  10162. - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
  10163. - Lots of clean up.
  10164. v0.2b - 2016-05-10
  10165. - Bug fixes.
  10166. v0.2a - 2016-05-10
  10167. - Made drflac_open_and_decode() more robust.
  10168. - Removed an unused debugging variable
  10169. v0.2 - 2016-05-09
  10170. - Added support for Ogg encapsulation.
  10171. - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
  10172. should be relative to the start or the current position. Also changes the seeking rules such that
  10173. seeking offsets will never be negative.
  10174. - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
  10175. v0.1b - 2016-05-07
  10176. - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
  10177. - Removed a stale comment.
  10178. v0.1a - 2016-05-05
  10179. - Minor formatting changes.
  10180. - Fixed a warning on the GCC build.
  10181. v0.1 - 2016-05-03
  10182. - Initial versioned release.
  10183. */
  10184. /*
  10185. This software is available as a choice of the following licenses. Choose
  10186. whichever you prefer.
  10187. ===============================================================================
  10188. ALTERNATIVE 1 - Public Domain (www.unlicense.org)
  10189. ===============================================================================
  10190. This is free and unencumbered software released into the public domain.
  10191. Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
  10192. software, either in source code form or as a compiled binary, for any purpose,
  10193. commercial or non-commercial, and by any means.
  10194. In jurisdictions that recognize copyright laws, the author or authors of this
  10195. software dedicate any and all copyright interest in the software to the public
  10196. domain. We make this dedication for the benefit of the public at large and to
  10197. the detriment of our heirs and successors. We intend this dedication to be an
  10198. overt act of relinquishment in perpetuity of all present and future rights to
  10199. this software under copyright law.
  10200. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  10201. IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  10202. FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  10203. AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
  10204. ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
  10205. WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  10206. For more information, please refer to <http://unlicense.org/>
  10207. ===============================================================================
  10208. ALTERNATIVE 2 - MIT No Attribution
  10209. ===============================================================================
  10210. Copyright 2020 David Reid
  10211. Permission is hereby granted, free of charge, to any person obtaining a copy of
  10212. this software and associated documentation files (the "Software"), to deal in
  10213. the Software without restriction, including without limitation the rights to
  10214. use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
  10215. of the Software, and to permit persons to whom the Software is furnished to do
  10216. so.
  10217. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  10218. IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  10219. FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  10220. AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  10221. LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  10222. OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  10223. SOFTWARE.
  10224. */