Update libTheora 1.1.1

This commit is contained in:
LuisAntonRebollo 2014-07-06 12:09:19 +02:00
parent 40cefe1002
commit d9fc3abaa4
137 changed files with 27096 additions and 28242 deletions

View file

@ -1,3 +1,65 @@
libtheora 1.1.1 (2009 October 1)
- Fix problems with MSVC inline assembly
- Add the missing encoder_disabled.c to the distribution
- build updates: autogen.sh should work better after switching systems
and the MSVC project now defaults to the dynamic runtime library
- Namespace some variables to avoid conflicts on wince.
libtheora 1.1.0 (2009 September 24)
- Fix various small issues with the example and telemetry code
- Fix handing a zero-byte packet as the first frame
- Documentation cleanup
- Two minor build fixes
libtheora 1.1beta3 (2009 August 22)
- Rate control fixes to smooth quality
- MSVC build now exports all of the 1.0 api
- Assorted small bug fixes
libtheora 1.1beta2 (2009 August 12)
- Fix a rate control problem with difficult input
- Build fixes for OpenBSD and Apple Xcode
- Examples now all use the 1.0 api
- TH_ENCCTL_SET_SPLEVEL works again
- Various bug fixes and source tree rearrangement
libtheora 1.1beta1 (2009 August 5)
- Support for two-pass encoding
- Performance optimization of both encoder and decoder
- Encoder supports dynamic adjustment of quality and
bitrate targets
- Encoder is generally more configurable, and all
rate control modes perform better
- Encoder now accepts 4:2:2 and 4:4:4 chroma sampling
- Decoder telemetry output shows quantization choice
and a breakdown of bitrate usage in the frame
- MSVC assembly optimizations up to date and functional
libtheora 1.1alpha2 (2009 May 26)
- Reduce lambda for small quantizers.
- New encoder fDCT does better on smooth gradients
- Use SATD for mode decisions (1-2% bitrate reduction)
- Assembly rewrite for new features and general speed up
- Share code between the encoder and decoder for performance
- Fix 4:2:2 decoding and telemetry
- MSVC project files updated, but assembly is disabled.
- New configure option --disable-spec to work around toolchain
detection failures.
- Limit symbol exports on MacOS X.
- Port remaining unit tests from the 1.0 release.
libtheora 1.1alpha1 (2009 March 27)
- Encoder rewrite with much improved vbr quality/bitrate and
better tracking of the target rate in cbr mode.
- MSVC project files do not work in this release.
libtheora 1.0 (2008 November 3)
- Merge x86 assembly for forward DCT from Thusnelda branch.

View file

@ -1,4 +1,4 @@
Copyright (C) 2002-2008 Xiph.Org Foundation and contributors.
Copyright (C) 2002-2009 Xiph.org Foundation
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
@ -11,7 +11,7 @@ notice, this list of conditions and the following disclaimer.
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of the Xiph.Org Foundation nor the names of its
- Neither the name of the Xiph.org Foundation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

View file

@ -0,0 +1,18 @@
Please see the file COPYING for the copyright license for this software.
In addition to and irrespective of the copyright license associated
with this software, On2 Technologies, Inc. makes the following statement
regarding technology used in this software:
On2 represents and warrants that it shall not assert any rights
relating to infringement of On2's registered patents, nor initiate
any litigation asserting such rights, against any person who, or
entity which utilizes the On2 VP3 Codec Software, including any
use, distribution, and sale of said Software; which make changes,
modifications, and improvements in said Software; and to use,
distribute, and sell said changes as well as applications for other
fields of use.
This reference implementation is originally derived from the On2 VP3
Codec Software, and the Theora video format is essentially compatible
with the VP3 video format, consisting of a backward-compatible superset.

View file

@ -0,0 +1,3 @@
## Process this file with automake to produce Makefile.in
SUBDIRS = theora

View file

@ -0,0 +1,414 @@
# Makefile.in generated by automake 1.6.3 from Makefile.am.
# @configure_input@
# Copyright 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002
# Free Software Foundation, Inc.
# This Makefile.in is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.
@SET_MAKE@
SHELL = @SHELL@
srcdir = @srcdir@
top_srcdir = @top_srcdir@
VPATH = @srcdir@
prefix = @prefix@
exec_prefix = @exec_prefix@
bindir = @bindir@
sbindir = @sbindir@
libexecdir = @libexecdir@
datadir = @datadir@
sysconfdir = @sysconfdir@
sharedstatedir = @sharedstatedir@
localstatedir = @localstatedir@
libdir = @libdir@
infodir = @infodir@
mandir = @mandir@
includedir = @includedir@
oldincludedir = /usr/include
pkgdatadir = $(datadir)/@PACKAGE@
pkglibdir = $(libdir)/@PACKAGE@
pkgincludedir = $(includedir)/@PACKAGE@
top_builddir = ..
ACLOCAL = @ACLOCAL@
AUTOCONF = @AUTOCONF@
AUTOMAKE = @AUTOMAKE@
AUTOHEADER = @AUTOHEADER@
am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
INSTALL = @INSTALL@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
INSTALL_DATA = @INSTALL_DATA@
install_sh_DATA = $(install_sh) -c -m 644
install_sh_PROGRAM = $(install_sh) -c
install_sh_SCRIPT = $(install_sh) -c
INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_HEADER = $(INSTALL_DATA)
transform = @program_transform_name@
NORMAL_INSTALL = :
PRE_INSTALL = :
POST_INSTALL = :
NORMAL_UNINSTALL = :
PRE_UNINSTALL = :
POST_UNINSTALL = :
host_alias = @host_alias@
host_triplet = @host@
EXEEXT = @EXEEXT@
OBJEXT = @OBJEXT@
PATH_SEPARATOR = @PATH_SEPARATOR@
ACLOCAL_AMFLAGS = @ACLOCAL_AMFLAGS@
AMTAR = @AMTAR@
AR = @AR@
ARGZ_H = @ARGZ_H@
AS = @AS@
AWK = @AWK@
BUILDABLE_EXAMPLES = @BUILDABLE_EXAMPLES@
CAIRO_CFLAGS = @CAIRO_CFLAGS@
CAIRO_LIBS = @CAIRO_LIBS@
CC = @CC@
CPP = @CPP@
CXX = @CXX@
CXXCPP = @CXXCPP@
DEBUG = @DEBUG@
DEPDIR = @DEPDIR@
DLLTOOL = @DLLTOOL@
DSYMUTIL = @DSYMUTIL@
DUMPBIN = @DUMPBIN@
F77 = @F77@
GCJ = @GCJ@
GCJFLAGS = @GCJFLAGS@
GETOPT_OBJS = @GETOPT_OBJS@
GREP = @GREP@
HAVE_BIBTEX = @HAVE_BIBTEX@
HAVE_DOXYGEN = @HAVE_DOXYGEN@
HAVE_PDFLATEX = @HAVE_PDFLATEX@
HAVE_PKG_CONFIG = @HAVE_PKG_CONFIG@
HAVE_TRANSFIG = @HAVE_TRANSFIG@
HAVE_VALGRIND = @HAVE_VALGRIND@
INCLTDL = @INCLTDL@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LD = @LD@
LIBADD_DL = @LIBADD_DL@
LIBADD_DLD_LINK = @LIBADD_DLD_LINK@
LIBADD_DLOPEN = @LIBADD_DLOPEN@
LIBADD_SHL_LOAD = @LIBADD_SHL_LOAD@
LIBLTDL = @LIBLTDL@
LIBM = @LIBM@
LIBTOOL = @LIBTOOL@
LIPO = @LIPO@
LN_S = @LN_S@
LTDLDEPS = @LTDLDEPS@
LTDLINCL = @LTDLINCL@
LTDLOPEN = @LTDLOPEN@
LT_CONFIG_H = @LT_CONFIG_H@
LT_DLLOADERS = @LT_DLLOADERS@
LT_DLPREOPEN = @LT_DLPREOPEN@
MAINT = @MAINT@
NM = @NM@
NMEDIT = @NMEDIT@
OBJDUMP = @OBJDUMP@
OGG_CFLAGS = @OGG_CFLAGS@
OGG_LIBS = @OGG_LIBS@
OSS_LIBS = @OSS_LIBS@
OTOOL = @OTOOL@
OTOOL64 = @OTOOL64@
PACKAGE = @PACKAGE@
PKG_CONFIG = @PKG_CONFIG@
PNG_CFLAGS = @PNG_CFLAGS@
PNG_LIBS = @PNG_LIBS@
PROFILE = @PROFILE@
RANLIB = @RANLIB@
RC = @RC@
SDL_CFLAGS = @SDL_CFLAGS@
SDL_CONFIG = @SDL_CONFIG@
SDL_LIBS = @SDL_LIBS@
SED = @SED@
STRIP = @STRIP@
THDEC_LIB_AGE = @THDEC_LIB_AGE@
THDEC_LIB_CURRENT = @THDEC_LIB_CURRENT@
THDEC_LIB_REVISION = @THDEC_LIB_REVISION@
THENC_LIB_AGE = @THENC_LIB_AGE@
THENC_LIB_CURRENT = @THENC_LIB_CURRENT@
THENC_LIB_REVISION = @THENC_LIB_REVISION@
THEORADEC_LDFLAGS = @THEORADEC_LDFLAGS@
THEORAENC_LDFLAGS = @THEORAENC_LDFLAGS@
THEORA_LDFLAGS = @THEORA_LDFLAGS@
TH_LIB_AGE = @TH_LIB_AGE@
TH_LIB_CURRENT = @TH_LIB_CURRENT@
TH_LIB_REVISION = @TH_LIB_REVISION@
VALGRIND_ENVIRONMENT = @VALGRIND_ENVIRONMENT@
VERSION = @VERSION@
VORBISENC_LIBS = @VORBISENC_LIBS@
VORBISFILE_LIBS = @VORBISFILE_LIBS@
VORBIS_CFLAGS = @VORBIS_CFLAGS@
VORBIS_LIBS = @VORBIS_LIBS@
am__include = @am__include@
am__quote = @am__quote@
install_sh = @install_sh@
lt_ECHO = @lt_ECHO@
ltdl_LIBOBJS = @ltdl_LIBOBJS@
ltdl_LTLIBOBJS = @ltdl_LTLIBOBJS@
sys_symbol_underscore = @sys_symbol_underscore@
SUBDIRS = theora
subdir = include
mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs
CONFIG_HEADER = $(top_builddir)/config.h
CONFIG_CLEAN_FILES =
DIST_SOURCES =
RECURSIVE_TARGETS = info-recursive dvi-recursive install-info-recursive \
uninstall-info-recursive all-recursive install-data-recursive \
install-exec-recursive installdirs-recursive install-recursive \
uninstall-recursive check-recursive installcheck-recursive
DIST_COMMON = Makefile.am Makefile.in
DIST_SUBDIRS = $(SUBDIRS)
all: all-recursive
.SUFFIXES:
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ Makefile.am $(top_srcdir)/configure.ac $(ACLOCAL_M4)
cd $(top_srcdir) && \
$(AUTOMAKE) --gnu include/Makefile
Makefile: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.in $(top_builddir)/config.status
cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)
mostlyclean-libtool:
-rm -f *.lo
clean-libtool:
-rm -rf .libs _libs
distclean-libtool:
-rm -f libtool
uninstall-info-am:
# This directory's subdirectories are mostly independent; you can cd
# into them and run `make' without going through this Makefile.
# To change the values of `make' variables: instead of editing Makefiles,
# (1) if the variable is set in `config.status', edit `config.status'
# (which will cause the Makefiles to be regenerated when you run `make');
# (2) otherwise, pass the desired values on the `make' command line.
$(RECURSIVE_TARGETS):
@set fnord $$MAKEFLAGS; amf=$$2; \
dot_seen=no; \
target=`echo $@ | sed s/-recursive//`; \
list='$(SUBDIRS)'; for subdir in $$list; do \
echo "Making $$target in $$subdir"; \
if test "$$subdir" = "."; then \
dot_seen=yes; \
local_target="$$target-am"; \
else \
local_target="$$target"; \
fi; \
(cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \
|| case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \
done; \
if test "$$dot_seen" = "no"; then \
$(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \
fi; test -z "$$fail"
mostlyclean-recursive clean-recursive distclean-recursive \
maintainer-clean-recursive:
@set fnord $$MAKEFLAGS; amf=$$2; \
dot_seen=no; \
case "$@" in \
distclean-* | maintainer-clean-*) list='$(DIST_SUBDIRS)' ;; \
*) list='$(SUBDIRS)' ;; \
esac; \
rev=''; for subdir in $$list; do \
if test "$$subdir" = "."; then :; else \
rev="$$subdir $$rev"; \
fi; \
done; \
rev="$$rev ."; \
target=`echo $@ | sed s/-recursive//`; \
for subdir in $$rev; do \
echo "Making $$target in $$subdir"; \
if test "$$subdir" = "."; then \
local_target="$$target-am"; \
else \
local_target="$$target"; \
fi; \
(cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \
|| case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \
done && test -z "$$fail"
tags-recursive:
list='$(SUBDIRS)'; for subdir in $$list; do \
test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \
done
ETAGS = etags
ETAGSFLAGS =
tags: TAGS
ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES)
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
mkid -fID $$unique
TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \
$(TAGS_FILES) $(LISP)
tags=; \
here=`pwd`; \
list='$(SUBDIRS)'; for subdir in $$list; do \
if test "$$subdir" = .; then :; else \
test -f $$subdir/TAGS && tags="$$tags -i $$here/$$subdir/TAGS"; \
fi; \
done; \
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
test -z "$(ETAGS_ARGS)$$tags$$unique" \
|| $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \
$$tags $$unique
GTAGS:
here=`$(am__cd) $(top_builddir) && pwd` \
&& cd $(top_srcdir) \
&& gtags -i $(GTAGS_ARGS) $$here
distclean-tags:
-rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH
DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
top_distdir = ..
distdir = $(top_distdir)/$(PACKAGE)-$(VERSION)
distdir: $(DISTFILES)
@list='$(DISTFILES)'; for file in $$list; do \
if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \
if test "$$dir" != "$$file" && test "$$dir" != "."; then \
dir="/$$dir"; \
$(mkinstalldirs) "$(distdir)$$dir"; \
else \
dir=''; \
fi; \
if test -d $$d/$$file; then \
if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \
fi; \
cp -pR $$d/$$file $(distdir)$$dir || exit 1; \
else \
test -f $(distdir)/$$file \
|| cp -p $$d/$$file $(distdir)/$$file \
|| exit 1; \
fi; \
done
list='$(SUBDIRS)'; for subdir in $$list; do \
if test "$$subdir" = .; then :; else \
test -d $(distdir)/$$subdir \
|| mkdir $(distdir)/$$subdir \
|| exit 1; \
(cd $$subdir && \
$(MAKE) $(AM_MAKEFLAGS) \
top_distdir="$(top_distdir)" \
distdir=../$(distdir)/$$subdir \
distdir) \
|| exit 1; \
fi; \
done
check-am: all-am
check: check-recursive
all-am: Makefile
installdirs: installdirs-recursive
installdirs-am:
install: install-recursive
install-exec: install-exec-recursive
install-data: install-data-recursive
uninstall: uninstall-recursive
install-am: all-am
@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am
installcheck: installcheck-recursive
install-strip:
$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
INSTALL_STRIP_FLAG=-s \
`test -z '$(STRIP)' || \
echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
mostlyclean-generic:
clean-generic:
distclean-generic:
-rm -f Makefile $(CONFIG_CLEAN_FILES)
maintainer-clean-generic:
@echo "This command is intended for maintainers to use"
@echo "it deletes files that may require special tools to rebuild."
clean: clean-recursive
clean-am: clean-generic clean-libtool mostlyclean-am
distclean: distclean-recursive
distclean-am: clean-am distclean-generic distclean-libtool \
distclean-tags
dvi: dvi-recursive
dvi-am:
info: info-recursive
info-am:
install-data-am:
install-exec-am:
install-info: install-info-recursive
install-man:
installcheck-am:
maintainer-clean: maintainer-clean-recursive
maintainer-clean-am: distclean-am maintainer-clean-generic
mostlyclean: mostlyclean-recursive
mostlyclean-am: mostlyclean-generic mostlyclean-libtool
uninstall-am: uninstall-info-am
uninstall-info: uninstall-info-recursive
.PHONY: $(RECURSIVE_TARGETS) GTAGS all all-am check check-am clean \
clean-generic clean-libtool clean-recursive distclean \
distclean-generic distclean-libtool distclean-recursive \
distclean-tags distdir dvi dvi-am dvi-recursive info info-am \
info-recursive install install-am install-data install-data-am \
install-data-recursive install-exec install-exec-am \
install-exec-recursive install-info install-info-am \
install-info-recursive install-man install-recursive \
install-strip installcheck installcheck-am installdirs \
installdirs-am installdirs-recursive maintainer-clean \
maintainer-clean-generic maintainer-clean-recursive mostlyclean \
mostlyclean-generic mostlyclean-libtool mostlyclean-recursive \
tags tags-recursive uninstall uninstall-am uninstall-info-am \
uninstall-info-recursive uninstall-recursive
# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:

View file

@ -0,0 +1,7 @@
## Process this file with automake to produce Makefile.in
theoraincludedir = $(includedir)/theora
theorainclude_HEADERS = theora.h theoradec.h theoraenc.h codec.h
noinst_HEADERS = codec.h theoradec.h

View file

@ -0,0 +1,355 @@
# Makefile.in generated by automake 1.6.3 from Makefile.am.
# @configure_input@
# Copyright 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002
# Free Software Foundation, Inc.
# This Makefile.in is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.
@SET_MAKE@
SHELL = @SHELL@
srcdir = @srcdir@
top_srcdir = @top_srcdir@
VPATH = @srcdir@
prefix = @prefix@
exec_prefix = @exec_prefix@
bindir = @bindir@
sbindir = @sbindir@
libexecdir = @libexecdir@
datadir = @datadir@
sysconfdir = @sysconfdir@
sharedstatedir = @sharedstatedir@
localstatedir = @localstatedir@
libdir = @libdir@
infodir = @infodir@
mandir = @mandir@
includedir = @includedir@
oldincludedir = /usr/include
pkgdatadir = $(datadir)/@PACKAGE@
pkglibdir = $(libdir)/@PACKAGE@
pkgincludedir = $(includedir)/@PACKAGE@
top_builddir = ../..
ACLOCAL = @ACLOCAL@
AUTOCONF = @AUTOCONF@
AUTOMAKE = @AUTOMAKE@
AUTOHEADER = @AUTOHEADER@
am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
INSTALL = @INSTALL@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
INSTALL_DATA = @INSTALL_DATA@
install_sh_DATA = $(install_sh) -c -m 644
install_sh_PROGRAM = $(install_sh) -c
install_sh_SCRIPT = $(install_sh) -c
INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_HEADER = $(INSTALL_DATA)
transform = @program_transform_name@
NORMAL_INSTALL = :
PRE_INSTALL = :
POST_INSTALL = :
NORMAL_UNINSTALL = :
PRE_UNINSTALL = :
POST_UNINSTALL = :
host_alias = @host_alias@
host_triplet = @host@
EXEEXT = @EXEEXT@
OBJEXT = @OBJEXT@
PATH_SEPARATOR = @PATH_SEPARATOR@
ACLOCAL_AMFLAGS = @ACLOCAL_AMFLAGS@
AMTAR = @AMTAR@
AR = @AR@
ARGZ_H = @ARGZ_H@
AS = @AS@
AWK = @AWK@
BUILDABLE_EXAMPLES = @BUILDABLE_EXAMPLES@
CAIRO_CFLAGS = @CAIRO_CFLAGS@
CAIRO_LIBS = @CAIRO_LIBS@
CC = @CC@
CPP = @CPP@
CXX = @CXX@
CXXCPP = @CXXCPP@
DEBUG = @DEBUG@
DEPDIR = @DEPDIR@
DLLTOOL = @DLLTOOL@
DSYMUTIL = @DSYMUTIL@
DUMPBIN = @DUMPBIN@
F77 = @F77@
GCJ = @GCJ@
GCJFLAGS = @GCJFLAGS@
GETOPT_OBJS = @GETOPT_OBJS@
GREP = @GREP@
HAVE_BIBTEX = @HAVE_BIBTEX@
HAVE_DOXYGEN = @HAVE_DOXYGEN@
HAVE_PDFLATEX = @HAVE_PDFLATEX@
HAVE_PKG_CONFIG = @HAVE_PKG_CONFIG@
HAVE_TRANSFIG = @HAVE_TRANSFIG@
HAVE_VALGRIND = @HAVE_VALGRIND@
INCLTDL = @INCLTDL@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LD = @LD@
LIBADD_DL = @LIBADD_DL@
LIBADD_DLD_LINK = @LIBADD_DLD_LINK@
LIBADD_DLOPEN = @LIBADD_DLOPEN@
LIBADD_SHL_LOAD = @LIBADD_SHL_LOAD@
LIBLTDL = @LIBLTDL@
LIBM = @LIBM@
LIBTOOL = @LIBTOOL@
LIPO = @LIPO@
LN_S = @LN_S@
LTDLDEPS = @LTDLDEPS@
LTDLINCL = @LTDLINCL@
LTDLOPEN = @LTDLOPEN@
LT_CONFIG_H = @LT_CONFIG_H@
LT_DLLOADERS = @LT_DLLOADERS@
LT_DLPREOPEN = @LT_DLPREOPEN@
MAINT = @MAINT@
NM = @NM@
NMEDIT = @NMEDIT@
OBJDUMP = @OBJDUMP@
OGG_CFLAGS = @OGG_CFLAGS@
OGG_LIBS = @OGG_LIBS@
OSS_LIBS = @OSS_LIBS@
OTOOL = @OTOOL@
OTOOL64 = @OTOOL64@
PACKAGE = @PACKAGE@
PKG_CONFIG = @PKG_CONFIG@
PNG_CFLAGS = @PNG_CFLAGS@
PNG_LIBS = @PNG_LIBS@
PROFILE = @PROFILE@
RANLIB = @RANLIB@
RC = @RC@
SDL_CFLAGS = @SDL_CFLAGS@
SDL_CONFIG = @SDL_CONFIG@
SDL_LIBS = @SDL_LIBS@
SED = @SED@
STRIP = @STRIP@
THDEC_LIB_AGE = @THDEC_LIB_AGE@
THDEC_LIB_CURRENT = @THDEC_LIB_CURRENT@
THDEC_LIB_REVISION = @THDEC_LIB_REVISION@
THENC_LIB_AGE = @THENC_LIB_AGE@
THENC_LIB_CURRENT = @THENC_LIB_CURRENT@
THENC_LIB_REVISION = @THENC_LIB_REVISION@
THEORADEC_LDFLAGS = @THEORADEC_LDFLAGS@
THEORAENC_LDFLAGS = @THEORAENC_LDFLAGS@
THEORA_LDFLAGS = @THEORA_LDFLAGS@
TH_LIB_AGE = @TH_LIB_AGE@
TH_LIB_CURRENT = @TH_LIB_CURRENT@
TH_LIB_REVISION = @TH_LIB_REVISION@
VALGRIND_ENVIRONMENT = @VALGRIND_ENVIRONMENT@
VERSION = @VERSION@
VORBISENC_LIBS = @VORBISENC_LIBS@
VORBISFILE_LIBS = @VORBISFILE_LIBS@
VORBIS_CFLAGS = @VORBIS_CFLAGS@
VORBIS_LIBS = @VORBIS_LIBS@
am__include = @am__include@
am__quote = @am__quote@
install_sh = @install_sh@
lt_ECHO = @lt_ECHO@
ltdl_LIBOBJS = @ltdl_LIBOBJS@
ltdl_LTLIBOBJS = @ltdl_LTLIBOBJS@
sys_symbol_underscore = @sys_symbol_underscore@
theoraincludedir = $(includedir)/theora
theorainclude_HEADERS = theora.h theoradec.h theoraenc.h codec.h
noinst_HEADERS = codec.h theoradec.h
subdir = include/theora
mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs
CONFIG_HEADER = $(top_builddir)/config.h
CONFIG_CLEAN_FILES =
DIST_SOURCES =
HEADERS = $(noinst_HEADERS) $(theorainclude_HEADERS)
DIST_COMMON = $(noinst_HEADERS) $(theorainclude_HEADERS) Makefile.am \
Makefile.in
all: all-am
.SUFFIXES:
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ Makefile.am $(top_srcdir)/configure.ac $(ACLOCAL_M4)
cd $(top_srcdir) && \
$(AUTOMAKE) --gnu include/theora/Makefile
Makefile: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.in $(top_builddir)/config.status
cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)
mostlyclean-libtool:
-rm -f *.lo
clean-libtool:
-rm -rf .libs _libs
distclean-libtool:
-rm -f libtool
uninstall-info-am:
theoraincludeHEADERS_INSTALL = $(INSTALL_HEADER)
install-theoraincludeHEADERS: $(theorainclude_HEADERS)
@$(NORMAL_INSTALL)
$(mkinstalldirs) $(DESTDIR)$(theoraincludedir)
@list='$(theorainclude_HEADERS)'; for p in $$list; do \
if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \
f="`echo $$p | sed -e 's|^.*/||'`"; \
echo " $(theoraincludeHEADERS_INSTALL) $$d$$p $(DESTDIR)$(theoraincludedir)/$$f"; \
$(theoraincludeHEADERS_INSTALL) $$d$$p $(DESTDIR)$(theoraincludedir)/$$f; \
done
uninstall-theoraincludeHEADERS:
@$(NORMAL_UNINSTALL)
@list='$(theorainclude_HEADERS)'; for p in $$list; do \
f="`echo $$p | sed -e 's|^.*/||'`"; \
echo " rm -f $(DESTDIR)$(theoraincludedir)/$$f"; \
rm -f $(DESTDIR)$(theoraincludedir)/$$f; \
done
ETAGS = etags
ETAGSFLAGS =
tags: TAGS
ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES)
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
mkid -fID $$unique
TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \
$(TAGS_FILES) $(LISP)
tags=; \
here=`pwd`; \
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
test -z "$(ETAGS_ARGS)$$tags$$unique" \
|| $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \
$$tags $$unique
GTAGS:
here=`$(am__cd) $(top_builddir) && pwd` \
&& cd $(top_srcdir) \
&& gtags -i $(GTAGS_ARGS) $$here
distclean-tags:
-rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH
DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
top_distdir = ../..
distdir = $(top_distdir)/$(PACKAGE)-$(VERSION)
distdir: $(DISTFILES)
@list='$(DISTFILES)'; for file in $$list; do \
if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \
if test "$$dir" != "$$file" && test "$$dir" != "."; then \
dir="/$$dir"; \
$(mkinstalldirs) "$(distdir)$$dir"; \
else \
dir=''; \
fi; \
if test -d $$d/$$file; then \
if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \
fi; \
cp -pR $$d/$$file $(distdir)$$dir || exit 1; \
else \
test -f $(distdir)/$$file \
|| cp -p $$d/$$file $(distdir)/$$file \
|| exit 1; \
fi; \
done
check-am: all-am
check: check-am
all-am: Makefile $(HEADERS)
installdirs:
$(mkinstalldirs) $(DESTDIR)$(theoraincludedir)
install: install-am
install-exec: install-exec-am
install-data: install-data-am
uninstall: uninstall-am
install-am: all-am
@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am
installcheck: installcheck-am
install-strip:
$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
INSTALL_STRIP_FLAG=-s \
`test -z '$(STRIP)' || \
echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
mostlyclean-generic:
clean-generic:
distclean-generic:
-rm -f Makefile $(CONFIG_CLEAN_FILES)
maintainer-clean-generic:
@echo "This command is intended for maintainers to use"
@echo "it deletes files that may require special tools to rebuild."
clean: clean-am
clean-am: clean-generic clean-libtool mostlyclean-am
distclean: distclean-am
distclean-am: clean-am distclean-generic distclean-libtool \
distclean-tags
dvi: dvi-am
dvi-am:
info: info-am
info-am:
install-data-am: install-theoraincludeHEADERS
install-exec-am:
install-info: install-info-am
install-man:
installcheck-am:
maintainer-clean: maintainer-clean-am
maintainer-clean-am: distclean-am maintainer-clean-generic
mostlyclean: mostlyclean-am
mostlyclean-am: mostlyclean-generic mostlyclean-libtool
uninstall-am: uninstall-info-am uninstall-theoraincludeHEADERS
.PHONY: GTAGS all all-am check check-am clean clean-generic \
clean-libtool distclean distclean-generic distclean-libtool \
distclean-tags distdir dvi dvi-am info info-am install \
install-am install-data install-data-am install-exec \
install-exec-am install-info install-info-am install-man \
install-strip install-theoraincludeHEADERS installcheck \
installcheck-am installdirs maintainer-clean \
maintainer-clean-generic mostlyclean mostlyclean-generic \
mostlyclean-libtool tags uninstall uninstall-am \
uninstall-info-am uninstall-theoraincludeHEADERS
# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:

View file

@ -5,7 +5,7 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
@ -24,10 +24,10 @@
* implementation for <a href="http://www.theora.org/">Theora</a>, a free,
* patent-unencumbered video codec.
* Theora is derived from On2's VP3 codec with additional features and
* integration for Ogg multimedia formats by
* integration with Ogg multimedia formats by
* <a href="http://www.xiph.org/">the Xiph.Org Foundation</a>.
* Complete documentation of the format itself is available in
* <a href="http://www.theora.org/doc/Theora_I_spec.pdf">the Theora
* <a href="http://www.theora.org/doc/Theora.pdf">the Theora
* specification</a>.
*
* \subsection Organization
@ -92,9 +92,9 @@ extern "C" {
/*@}*/
/**The currently defined color space tags.
* See <a href="http://www.theora.org/doc/Theora_I_spec.pdf">the Theora
* specification</a>, Chapter 4, for exact details on the meaning of each of
* these color spaces.*/
* See <a href="http://www.theora.org/doc/Theora.pdf">the Theora
* specification</a>, Chapter 4, for exact details on the meaning
* of each of these color spaces.*/
typedef enum{
/**The color space was not specified at the encoder.
It may be conveyed by an external means.*/
@ -108,13 +108,13 @@ typedef enum{
}th_colorspace;
/**The currently defined pixel format tags.
* See <a href="http://www.theora.org/doc/Theora_I_spec.pdf">the Theora
* See <a href="http://www.theora.org/doc/Theora.pdf">the Theora
* specification</a>, Section 4.4, for details on the precise sample
* locations.*/
typedef enum{
/**Chroma decimation by 2 in both the X and Y directions (4:2:0).
The Cb and Cr chroma planes are half the width and half the height of the
luma plane.*/
The Cb and Cr chroma planes are half the width and half the
height of the luma plane.*/
TH_PF_420,
/**Currently reserved.*/
TH_PF_RSVD,
@ -133,11 +133,11 @@ typedef enum{
/**A buffer for a single color plane in an uncompressed image.
* This contains the image data in a left-to-right, top-down format.
* Each row of pixels is stored contiguously in memory, but successive rows
* need not be.
* Each row of pixels is stored contiguously in memory, but successive
* rows need not be.
* Use \a stride to compute the offset of the next row.
* The encoder accepts both positive \a stride values (top-down in memory) and
* negative (bottom-up in memory).
* The encoder accepts both positive \a stride values (top-down in memory)
* and negative (bottom-up in memory).
* The decoder currently always generates images with positive strides.*/
typedef struct{
/**The width of this plane.*/
@ -151,18 +151,18 @@ typedef struct{
}th_img_plane;
/**A complete image buffer for an uncompressed frame.
* The chroma planes may be decimated by a factor of two in either direction,
* as indicated by th_info#pixel_fmt.
* The chroma planes may be decimated by a factor of two in either
* direction, as indicated by th_info#pixel_fmt.
* The width and height of the Y' plane must be multiples of 16.
* They may need to be cropped for display, using the rectangle specified by
* th_info#pic_x, th_info#pic_y, th_info#pic_width, and
* th_info#pic_height.
* They may need to be cropped for display, using the rectangle
* specified by th_info#pic_x, th_info#pic_y, th_info#pic_width,
* and th_info#pic_height.
* All samples are 8 bits.
* \note The term YUV often used to describe a colorspace is ambiguous.
* The exact parameters of the RGB to YUV conversion process aside, in many
* contexts the U and V channels actually have opposite meanings.
* To avoid this confusion, we are explicit: the name of the color channels are
* Y'CbCr, and they appear in that order, always.
* The exact parameters of the RGB to YUV conversion process aside, in
* many contexts the U and V channels actually have opposite meanings.
* To avoid this confusion, we are explicit: the name of the color
* channels are Y'CbCr, and they appear in that order, always.
* The prime symbol denotes that the Y channel is non-linear.
* Cb and Cr stand for "Chroma blue" and "Chroma red", respectively.*/
typedef th_img_plane th_ycbcr_buffer[3];
@ -192,7 +192,7 @@ typedef th_img_plane th_ycbcr_buffer[3];
*
* It is also generally recommended that the offsets and sizes should still be
* multiples of 2 to avoid chroma sampling shifts when chroma is sub-sampled.
* See <a href="http://www.theora.org/doc/Theora_I_spec.pdf">the Theora
* See <a href="http://www.theora.org/doc/Theora.pdf">the Theora
* specification</a>, Section 4.4, for more details.
*
* Frame rate, in frames per second, is stored as a rational fraction, as is
@ -230,8 +230,8 @@ typedef struct{
* #frame_height-#pic_height-#pic_y must be no larger than 255.
* This slightly funny restriction is due to the fact that the offset is
* specified from the top of the image for consistency with the standard
* graphics left-handed coordinate system used throughout this API, while it
* is stored in the encoded stream as an offset from the bottom.*/
* graphics left-handed coordinate system used throughout this API, while
* it is stored in the encoded stream as an offset from the bottom.*/
ogg_uint32_t pic_y;
/**\name Frame rate
* The frame rate, as a fraction.
@ -259,9 +259,6 @@ typedef struct{
/**The target bit-rate in bits per second.
If initializing an encoder with this struct, set this field to a non-zero
value to activate CBR encoding by default.*/
/*TODO: Current encoder does not support CBR mode, or anything like it.
We also don't really know what nominal rate each quality level
corresponds to yet.*/
int target_bitrate;
/**The target quality level.
Valid values range from 0 to 63, inclusive, with higher values giving
@ -314,7 +311,7 @@ typedef struct{
* A particular tag may occur more than once, and order is significant.
* The character set encoding for the strings is always UTF-8, but the tag
* names are limited to ASCII, and treated as case-insensitive.
* See <a href="http://www.theora.org/doc/Theora_I_spec.pdf">the Theora
* See <a href="http://www.theora.org/doc/Theora.pdf">the Theora
* specification</a>, Section 6.3.3 for details.
*
* In filling in this structure, th_decode_headerin() will null-terminate

View file

@ -5,7 +5,7 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
@ -27,11 +27,11 @@ extern "C"
#include <ogg/ogg.h>
/** \defgroup oldfuncs Legacy pre-1.0 C API */
/* @{ */
/** \mainpage
*
/** \file
* The libtheora pre-1.0 legacy C API.
*
* \ingroup oldfuncs
*
* \section intro Introduction
*
* This is the documentation for the libtheora legacy C API, declared in
@ -42,7 +42,7 @@ extern "C"
*
* libtheora is the reference implementation for
* <a href="http://www.theora.org/">Theora</a>, a free video codec.
* Theora is derived from On2's VP3 codec with improved integration for
* Theora is derived from On2's VP3 codec with improved integration with
* Ogg multimedia formats by <a href="http://www.xiph.org/">Xiph.Org</a>.
*
* \section overview Overview
@ -114,21 +114,11 @@ extern "C"
* checking beyond whether a header bit is present. Instead, use the
* theora_decode_header() function and check the return value; or examine the
* header bytes at the beginning of the Ogg page.
*
* \subsection example Example Decoder
*
* See <a href="http://svn.xiph.org/trunk/theora/examples/dump_video.c">
* examples/dump_video.c</a> for a simple decoder implementation.
*
* \section encoding Encoding Process
*
* See <a href="http://svn.xiph.org/trunk/theora/examples/encoder_example.c">
* examples/encoder_example.c</a> for a simple encoder implementation.
*/
/** \file
* The libtheora pre-1.0 legacy C API.
*/
/** \defgroup oldfuncs Legacy pre-1.0 C API */
/* @{ */
/**
* A YUV buffer for passing uncompressed frames to and from the codec.
@ -292,14 +282,21 @@ typedef struct theora_comment{
/**\name theora_control() codes */
/**\anchor decctlcodes
/* \anchor decctlcodes_old
* These are the available request codes for theora_control()
* when called with a decoder instance.
* By convention, these are odd, to distinguish them from the
* \ref encctlcodes "encoder control codes".
* By convention decoder control codes are odd, to distinguish
* them from \ref encctlcodes_old "encoder control codes" which
* are even.
*
* Note that since the 1.0 release, both the legacy and the final
* implementation accept all the same control codes, but only the
* final API declares the newer codes.
*
* Keep any experimental or vendor-specific values above \c 0x8000.*/
/*@{*/
/**Get the maximum post-processing level.
* The decoder supports a post-processing filter that can improve
* the appearance of the decoded images. This returns the highest
@ -324,9 +321,9 @@ typedef struct theora_comment{
* \param[in] buf <tt>ogg_uint32_t</tt>: The maximum distance between key
* frames.
* \param[out] buf <tt>ogg_uint32_t</tt>: The actual maximum distance set.
* \retval TH_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a buf_sz is not <tt>sizeof(ogg_uint32_t)</tt>.
* \retval TH_IMPL Not supported by this implementation.*/
* \retval OC_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval OC_EINVAL \a buf_sz is not <tt>sizeof(ogg_uint32_t)</tt>.
* \retval OC_IMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_KEYFRAME_FREQUENCY_FORCE (4)
/**Set the granule position.
@ -338,33 +335,23 @@ typedef struct theora_comment{
*/
#define TH_DECCTL_SET_GRANPOS (5)
/**\anchor encctlcodes_old */
/**\anchor encctlcodes
* These are the available request codes for theora_control()
* when called with an encoder instance.
* By convention, these are even, to distinguish them from the
* \ref decctlcodes "decoder control codes".
* Keep any experimental or vendor-specific values above \c 0x8000.*/
/*@{*/
/**Sets the quantization parameters to use.
* The parameters are copied, not stored by reference, so they can be freed
* after this call.
* <tt>NULL</tt> may be specified to revert to the default parameters.
* For the current encoder, <tt>scale[ci!=0][qi]</tt> must be no greater than
* <tt>scale[ci!=0][qi-1]</tt> and <tt>base[qti][pli][qi][ci]</tt> must be no
* greater than <tt>base[qti][pli][qi-1][ci]</tt>.
* These two conditions ensure that the actual quantizer for a given \a qti,
* \a pli, and \a ci does not increase as \a qi increases.
*
* \param[in] buf #th_quant_info
* \retval TH_FAULT \a theora_state is <tt>NULL</tt>.
* \retval TH_EINVAL Encoding has already begun, the quantization parameters
* do not meet one of the above stated conditions, \a buf
* is <tt>NULL</tt> and \a buf_sz is not zero, or \a buf
* is non-<tt>NULL</tt> and \a buf_sz is not
* <tt>sizeof(#th_quant_info)</tt>.
* \retval TH_IMPL Not supported by this implementation.*/
* \retval OC_FAULT \a theora_state is <tt>NULL</tt>.
* \retval OC_EINVAL Encoding has already begun, the quantization parameters
* are not acceptable to this version of the encoder,
* \a buf is <tt>NULL</tt> and \a buf_sz is not zero,
* or \a buf is non-<tt>NULL</tt> and \a buf_sz is
* not <tt>sizeof(#th_quant_info)</tt>.
* \retval OC_IMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_QUANT_PARAMS (2)
/**Disables any encoder features that would prevent lossless transcoding back
* to VP3.
* This primarily means disabling block-level QI values and not using 4MV mode
@ -389,10 +376,11 @@ typedef struct theora_comment{
* 4:2:0, the picture region is smaller than the full frame,
* or if encoding has begun, preventing the quantization
* tables and codebooks from being set.
* \retval TH_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a buf_sz is not <tt>sizeof(int)</tt>.
* \retval TH_IMPL Not supported by this implementation.*/
* \retval OC_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval OC_EINVAL \a buf_sz is not <tt>sizeof(int)</tt>.
* \retval OC_IMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_VP3_COMPATIBLE (10)
/**Gets the maximum speed level.
* Higher speed levels favor quicker encoding over better quality per bit.
* Depending on the encoding mode, and the internal algorithms used, quality
@ -402,25 +390,27 @@ typedef struct theora_comment{
* the current encoding mode (VBR vs. CQI, etc.).
*
* \param[out] buf int: The maximum encoding speed level.
* \retval TH_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a buf_sz is not <tt>sizeof(int)</tt>.
* \retval TH_IMPL Not supported by this implementation in the current
* \retval OC_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval OC_EINVAL \a buf_sz is not <tt>sizeof(int)</tt>.
* \retval OC_IMPL Not supported by this implementation in the current
* encoding mode.*/
#define TH_ENCCTL_GET_SPLEVEL_MAX (12)
/**Sets the speed level.
* By default a speed value of 1 is used.
*
* \param[in] buf int: The new encoding speed level.
* 0 is slowest, larger values use less CPU.
* \retval TH_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a buf_sz is not <tt>sizeof(int)</tt>, or the
* \retval OC_FAULT \a theora_state or \a buf is <tt>NULL</tt>.
* \retval OC_EINVAL \a buf_sz is not <tt>sizeof(int)</tt>, or the
* encoding speed level is out of bounds.
* The maximum encoding speed level may be
* implementation- and encoding mode-specific, and can be
* obtained via #TH_ENCCTL_GET_SPLEVEL_MAX.
* \retval TH_IMPL Not supported by this implementation in the current
* \retval OC_IMPL Not supported by this implementation in the current
* encoding mode.*/
#define TH_ENCCTL_SET_SPLEVEL (14)
/*@}*/
#define OC_FAULT -1 /**< General failure */
@ -779,8 +769,8 @@ extern void theora_comment_clear(theora_comment *tc);
* This is used to provide advanced control the encoding process.
* \param th A #theora_state handle.
* \param req The control code to process.
* See \ref encctlcodes "the list of available control codes"
* for details.
* See \ref encctlcodes_old "the list of available
* control codes" for details.
* \param buf The parameters for this control code.
* \param buf_sz The size of the parameter buffer.*/
extern int theora_control(theora_state *th,int req,void *buf,size_t buf_sz);

View file

@ -5,7 +5,7 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
@ -38,6 +38,10 @@ extern "C" {
* Keep any experimental or vendor-specific values above \c 0x8000.*/
/*@{*/
/**Gets the maximum post-processing level.
* The decoder supports a post-processing filter that can improve
* the appearance of the decoded images. This returns the highest
* level setting for this post-processor, corresponding to maximum
* improvement and computational expense.
*
* \param[out] _buf int: The maximum post-processing level.
* \retval TH_EFAULT \a _dec_ctx or \a _buf is <tt>NULL</tt>.
@ -47,6 +51,10 @@ extern "C" {
/**Sets the post-processing level.
* By default, post-processing is disabled.
*
* Sets the level of post-processing to use when decoding the
* compressed stream. This must be a value between zero (off)
* and the maximum returned by TH_DECCTL_GET_PPLEVEL_MAX.
*
* \param[in] _buf int: The new post-processing level.
* 0 to disable; larger values use more CPU.
* \retval TH_EFAULT \a _dec_ctx or \a _buf is <tt>NULL</tt>.
@ -83,6 +91,15 @@ extern "C" {
* \retval TH_EINVAL \a _buf_sz is not
* <tt>sizeof(th_stripe_callback)</tt>.*/
#define TH_DECCTL_SET_STRIPE_CB (7)
/**Enables telemetry and sets the macroblock display mode */
#define TH_DECCTL_SET_TELEMETRY_MBMODE (9)
/**Enables telemetry and sets the motion vector display mode */
#define TH_DECCTL_SET_TELEMETRY_MV (11)
/**Enables telemetry and sets the adaptive quantization display mode */
#define TH_DECCTL_SET_TELEMETRY_QI (13)
/**Enables telemetry and sets the bitstream breakdown visualization mode */
#define TH_DECCTL_SET_TELEMETRY_BITS (15)
/*@}*/
@ -289,6 +306,7 @@ extern int th_decode_packetin(th_dec_ctx *_dec,const ogg_packet *_op,
* It may be freed or overwritten without notification when
* subsequent frames are decoded.
* \retval 0 Success
* \retval TH_EFAULT \a _dec or \a _ycbcr was <tt>NULL</tt>.
*/
extern int th_decode_ycbcr_out(th_dec_ctx *_dec,
th_ycbcr_buffer _ycbcr);

View file

@ -5,7 +5,7 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2003 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
@ -49,26 +49,20 @@ extern "C" {
* <tt>NULL</tt> and \a _buf_sz is not zero, or \a _buf is
* non-<tt>NULL</tt> and \a _buf_sz is not
* <tt>sizeof(#th_huff_code)*#TH_NHUFFMAN_TABLES*#TH_NDCT_TOKENS</tt>.
* \retval TH_IMPL Not supported by this implementation.*/
* \retval TH_EIMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_HUFFMAN_CODES (0)
/**Sets the quantization parameters to use.
* The parameters are copied, not stored by reference, so they can be freed
* after this call.
* <tt>NULL</tt> may be specified to revert to the default parameters.
* For the current encoder, <tt>scale[ci!=0][qi]</tt> must be no greater than
* <tt>scale[ci!=0][qi-1]</tt> and <tt>base[qti][pli][qi][ci]</tt> must be no
* greater than <tt>base[qti][pli][qi-1][ci]</tt>.
* These two conditions ensure that the actual quantizer for a given \a qti,
* \a pli, and \a ci does not increase as \a qi increases.
*
* \param[in] _buf #th_quant_info
* \retval TH_EFAULT \a _enc_ctx is <tt>NULL</tt>.
* \retval TH_EINVAL Encoding has already begun, the quantization parameters
* do not meet one of the above stated conditions, \a _buf
* is <tt>NULL</tt> and \a _buf_sz is not zero, or \a _buf
* is non-<tt>NULL</tt> and \a _buf_sz is not
* <tt>sizeof(#th_quant_info)</tt>.
* \retval TH_IMPL Not supported by this implementation.*/
* \retval TH_EINVAL Encoding has already begun, \a _buf is
* <tt>NULL</tt> and \a _buf_sz is not zero,
* or \a _buf is non-<tt>NULL</tt> and
* \a _buf_sz is not <tt>sizeof(#th_quant_info)</tt>.
* \retval TH_EIMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_QUANT_PARAMS (2)
/**Sets the maximum distance between key frames.
* This can be changed during an encode, but will be bounded by
@ -81,12 +75,12 @@ extern "C" {
* \param[out] _buf <tt>ogg_uint32_t</tt>: The actual maximum distance set.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(ogg_uint32_t)</tt>.
* \retval TH_IMPL Not supported by this implementation.*/
* \retval TH_EIMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_KEYFRAME_FREQUENCY_FORCE (4)
/**Disables any encoder features that would prevent lossless transcoding back
* to VP3.
* This primarily means disabling block-level QI values and not using 4MV mode
* when any of the luma blocks in a macro block are not coded.
* This primarily means disabling block-adaptive quantization and always coding
* all four luma blocks in a macro block when 4MV is used.
* It also includes using the VP3 quantization tables and Huffman codes; if you
* set them explicitly after calling this function, the resulting stream will
* not be VP3-compatible.
@ -109,7 +103,7 @@ extern "C" {
* tables and codebooks from being set.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(int)</tt>.
* \retval TH_IMPL Not supported by this implementation.*/
* \retval TH_EIMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_VP3_COMPATIBLE (10)
/**Gets the maximum speed level.
* Higher speed levels favor quicker encoding over better quality per bit.
@ -117,28 +111,254 @@ extern "C" {
* may actually improve, but in this case bitrate will also likely increase.
* In any case, overall rate/distortion performance will probably decrease.
* The maximum value, and the meaning of each value, may change depending on
* the current encoding mode (VBR vs. CQI, etc.).
* the current encoding mode (VBR vs. constant quality, etc.).
*
* \param[out] _buf int: The maximum encoding speed level.
* \param[out] _buf <tt>int</tt>: The maximum encoding speed level.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(int)</tt>.
* \retval TH_IMPL Not supported by this implementation in the current
* \retval TH_EIMPL Not supported by this implementation in the current
* encoding mode.*/
#define TH_ENCCTL_GET_SPLEVEL_MAX (12)
/**Sets the speed level.
* By default, the slowest speed (0) is used.
* The current speed level may be retrieved using #TH_ENCCTL_GET_SPLEVEL.
*
* \param[in] _buf int: The new encoding speed level.
* 0 is slowest, larger values use less CPU.
* \param[in] _buf <tt>int</tt>: The new encoding speed level.
* 0 is slowest, larger values use less CPU.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(int)</tt>, or the
* encoding speed level is out of bounds.
* The maximum encoding speed level may be
* implementation- and encoding mode-specific, and can be
* obtained via #TH_ENCCTL_GET_SPLEVEL_MAX.
* \retval TH_IMPL Not supported by this implementation in the current
* \retval TH_EIMPL Not supported by this implementation in the current
* encoding mode.*/
#define TH_ENCCTL_SET_SPLEVEL (14)
/**Gets the current speed level.
* The default speed level may vary according to encoder implementation, but if
* this control code is not supported (it returns #TH_EIMPL), the default may
* be assumed to be the slowest available speed (0).
* The maximum encoding speed level may be implementation- and encoding
* mode-specific, and can be obtained via #TH_ENCCTL_GET_SPLEVEL_MAX.
*
* \param[out] _buf <tt>int</tt>: The current encoding speed level.
* 0 is slowest, larger values use less CPU.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(int)</tt>.
* \retval TH_EIMPL Not supported by this implementation in the current
* encoding mode.*/
#define TH_ENCCTL_GET_SPLEVEL (16)
/**Sets the number of duplicates of the next frame to produce.
* Although libtheora can encode duplicate frames very cheaply, it costs some
* amount of CPU to detect them, and a run of duplicates cannot span a
* keyframe boundary.
* This control code tells the encoder to produce the specified number of extra
* duplicates of the next frame.
* This allows the encoder to make smarter keyframe placement decisions and
* rate control decisions, and reduces CPU usage as well, when compared to
* just submitting the same frame for encoding multiple times.
* This setting only applies to the next frame submitted for encoding.
* You MUST call th_encode_packetout() repeatedly until it returns 0, or the
* extra duplicate frames will be lost.
*
* \param[in] _buf <tt>int</tt>: The number of duplicates to produce.
* If this is negative or zero, no duplicates will be produced.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(int)</tt>, or the
* number of duplicates is greater than or equal to the
* maximum keyframe interval.
* In the latter case, NO duplicate frames will be produced.
* You must ensure that the maximum keyframe interval is set
* larger than the maximum number of duplicates you will
* ever wish to insert prior to encoding.
* \retval TH_EIMPL Not supported by this implementation in the current
* encoding mode.*/
#define TH_ENCCTL_SET_DUP_COUNT (18)
/**Modifies the default bitrate management behavior.
* Use to allow or disallow frame dropping, and to enable or disable capping
* bit reservoir overflows and underflows.
* See \ref encctlcodes "the list of available flags".
* The flags are set by default to
* <tt>#TH_RATECTL_DROP_FRAMES|#TH_RATECTL_CAP_OVERFLOW</tt>.
*
* \param[in] _buf <tt>int</tt>: Any combination of
* \ref ratectlflags "the available flags":
* - #TH_RATECTL_DROP_FRAMES: Enable frame dropping.
* - #TH_RATECTL_CAP_OVERFLOW: Don't bank excess bits for later
* use.
* - #TH_RATECTL_CAP_UNDERFLOW: Don't try to make up shortfalls
* later.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(int)</tt> or rate control
* is not enabled.
* \retval TH_EIMPL Not supported by this implementation in the current
* encoding mode.*/
#define TH_ENCCTL_SET_RATE_FLAGS (20)
/**Sets the size of the bitrate management bit reservoir as a function
* of number of frames.
* The reservoir size affects how quickly bitrate management reacts to
* instantaneous changes in the video complexity.
* Larger reservoirs react more slowly, and provide better overall quality, but
* require more buffering by a client, adding more latency to live streams.
* By default, libtheora sets the reservoir to the maximum distance between
* keyframes, subject to a minimum and maximum limit.
* This call may be used to increase or decrease the reservoir, increasing or
* decreasing the allowed temporary variance in bitrate.
* An implementation may impose some limits on the size of a reservoir it can
* handle, in which case the actual reservoir size may not be exactly what was
* requested.
* The actual value set will be returned.
*
* \param[in] _buf <tt>int</tt>: Requested size of the reservoir measured in
* frames.
* \param[out] _buf <tt>int</tt>: The actual size of the reservoir set.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(int)</tt>, or rate control
* is not enabled. The buffer has an implementation
* defined minimum and maximum size and the value in _buf
* will be adjusted to match the actual value set.
* \retval TH_EIMPL Not supported by this implementation in the current
* encoding mode.*/
#define TH_ENCCTL_SET_RATE_BUFFER (22)
/**Enable pass 1 of two-pass encoding mode and retrieve the first pass metrics.
* Pass 1 mode must be enabled before the first frame is encoded, and a target
* bitrate must have already been specified to the encoder.
* Although this does not have to be the exact rate that will be used in the
* second pass, closer values may produce better results.
* The first call returns the size of the two-pass header data, along with some
* placeholder content, and sets the encoder into pass 1 mode implicitly.
* This call sets the encoder to pass 1 mode implicitly.
* Then, a subsequent call must be made after each call to
* th_encode_ycbcr_in() to retrieve the metrics for that frame.
* An additional, final call must be made to retrieve the summary data,
* containing such information as the total number of frames, etc.
* This must be stored in place of the placeholder data that was returned
* in the first call, before the frame metrics data.
* All of this data must be presented back to the encoder during pass 2 using
* #TH_ENCCTL_2PASS_IN.
*
* \param[out] <tt>char *</tt>_buf: Returns a pointer to internal storage
* containing the two pass metrics data.
* This storage is only valid until the next call, or until the
* encoder context is freed, and must be copied by the
* application.
* \retval >=0 The number of bytes of metric data available in the
* returned buffer.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL \a _buf_sz is not <tt>sizeof(char *)</tt>, no target
* bitrate has been set, or the first call was made after
* the first frame was submitted for encoding.
* \retval TH_EIMPL Not supported by this implementation.*/
#define TH_ENCCTL_2PASS_OUT (24)
/**Submits two-pass encoding metric data collected the first encoding pass to
* the second pass.
* The first call must be made before the first frame is encoded, and a target
* bitrate must have already been specified to the encoder.
* It sets the encoder to pass 2 mode implicitly; this cannot be disabled.
* The encoder may require reading data from some or all of the frames in
* advance, depending on, e.g., the reservoir size used in the second pass.
* You must call this function repeatedly before each frame to provide data
* until either a) it fails to consume all of the data presented or b) all of
* the pass 1 data has been consumed.
* In the first case, you must save the remaining data to be presented after
* the next frame.
* You can call this function with a NULL argument to get an upper bound on
* the number of bytes that will be required before the next frame.
*
* When pass 2 is first enabled, the default bit reservoir is set to the entire
* file; this gives maximum flexibility but can lead to very high peak rates.
* You can subsequently set it to another value with #TH_ENCCTL_SET_RATE_BUFFER
* (e.g., to set it to the keyframe interval for non-live streaming), however,
* you may then need to provide more data before the next frame.
*
* \param[in] _buf <tt>char[]</tt>: A buffer containing the data returned by
* #TH_ENCCTL_2PASS_OUT in pass 1.
* You may pass <tt>NULL</tt> for \a _buf to return an upper
* bound on the number of additional bytes needed before the
* next frame.
* The summary data returned at the end of pass 1 must be at
* the head of the buffer on the first call with a
* non-<tt>NULL</tt> \a _buf, and the placeholder data
* returned at the start of pass 1 should be omitted.
* After each call you should advance this buffer by the number
* of bytes consumed.
* \retval >0 The number of bytes of metric data required/consumed.
* \retval 0 No more data is required before the next frame.
* \retval TH_EFAULT \a _enc_ctx is <tt>NULL</tt>.
* \retval TH_EINVAL No target bitrate has been set, or the first call was
* made after the first frame was submitted for
* encoding.
* \retval TH_ENOTFORMAT The data did not appear to be pass 1 from a compatible
* implementation of this library.
* \retval TH_EBADHEADER The data was invalid; this may be returned when
* attempting to read an aborted pass 1 file that still
* has the placeholder data in place of the summary
* data.
* \retval TH_EIMPL Not supported by this implementation.*/
#define TH_ENCCTL_2PASS_IN (26)
/**Sets the current encoding quality.
* This is only valid so long as no bitrate has been specified, either through
* the #th_info struct used to initialize the encoder or through
* #TH_ENCCTL_SET_BITRATE (this restriction may be relaxed in a future
* version).
* If it is set before the headers are emitted, the target quality encoded in
* them will be updated.
*
* \param[in] _buf <tt>int</tt>: The new target quality, in the range 0...63,
* inclusive.
* \retval 0 Success.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL A target bitrate has already been specified, or the
* quality index was not in the range 0...63.
* \retval TH_EIMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_QUALITY (28)
/**Sets the current encoding bitrate.
* Once a bitrate is set, the encoder must use a rate-controlled mode for all
* future frames (this restriction may be relaxed in a future version).
* If it is set before the headers are emitted, the target bitrate encoded in
* them will be updated.
* Due to the buffer delay, the exact bitrate of each section of the encode is
* not guaranteed.
* The encoder may have already used more bits than allowed for the frames it
* has encoded, expecting to make them up in future frames, or it may have
* used fewer, holding the excess in reserve.
* The exact transition between the two bitrates is not well-defined by this
* API, but may be affected by flags set with #TH_ENCCTL_SET_RATE_FLAGS.
* After a number of frames equal to the buffer delay, one may expect further
* output to average at the target bitrate.
*
* \param[in] _buf <tt>long</tt>: The new target bitrate, in bits per second.
* \retval 0 Success.
* \retval TH_EFAULT \a _enc_ctx or \a _buf is <tt>NULL</tt>.
* \retval TH_EINVAL The target bitrate was not positive.
* \retval TH_EIMPL Not supported by this implementation.*/
#define TH_ENCCTL_SET_BITRATE (30)
/*@}*/
/**\name TH_ENCCTL_SET_RATE_FLAGS flags
* \anchor ratectlflags
* These are the flags available for use with #TH_ENCCTL_SET_RATE_FLAGS.*/
/*@{*/
/**Drop frames to keep within bitrate buffer constraints.
* This can have a severe impact on quality, but is the only way to ensure that
* bitrate targets are met at low rates during sudden bursts of activity.*/
#define TH_RATECTL_DROP_FRAMES (0x1)
/**Ignore bitrate buffer overflows.
* If the encoder uses so few bits that the reservoir of available bits
* overflows, ignore the excess.
* The encoder will not try to use these extra bits in future frames.
* At high rates this may cause the result to be undersized, but allows a
* client to play the stream using a finite buffer; it should normally be
* enabled.*/
#define TH_RATECTL_CAP_OVERFLOW (0x2)
/**Ignore bitrate buffer underflows.
* If the encoder uses so many bits that the reservoir of available bits
* underflows, ignore the deficit.
* The encoder will not try to make up these extra bits in future frames.
* At low rates this may cause the result to be oversized; it should normally
* be disabled.*/
#define TH_RATECTL_CAP_UNDERFLOW (0x4)
/*@}*/

View file

@ -0,0 +1,173 @@
INCLUDES = -I$(top_srcdir)/include
AM_CFLAGS = $(OGG_CFLAGS) $(CAIRO_CFLAGS)
EXTRA_DIST = \
cpu.c \
encoder_disabled.c \
x86/mmxencfrag.c \
x86/mmxfdct.c \
x86/sse2fdct.c \
x86/x86enc.c \
x86/x86enc.h \
x86/mmxfrag.c \
x86/mmxfrag.h \
x86/mmxidct.c \
x86/mmxloop.h \
x86/mmxstate.c \
x86/x86int.h \
x86/x86state.c \
x86_vc
lib_LTLIBRARIES = libtheoradec.la libtheoraenc.la libtheora.la
if THEORA_DISABLE_ENCODE
encoder_uniq_sources = \
encoder_disabled.c
encoder_sources = \
$(encoder_uniq_sources)
else
encoder_uniq_x86_sources = \
x86/mmxencfrag.c \
x86/mmxfdct.c \
x86/x86enc.c
encoder_uniq_x86_64_sources = \
x86/sse2fdct.c
encoder_shared_x86_sources = \
x86/mmxfrag.c \
x86/mmxidct.c \
x86/mmxstate.c \
x86/x86state.c
encoder_shared_x86_64_sources =
if CPU_x86_64
encoder_uniq_arch_sources = \
$(encoder_uniq_x86_sources) \
$(encoder_uniq_x86_64_sources)
encoder_shared_arch_sources = \
$(encoder_shared_x86_sources) \
$(encoder_shared_x86_64_sources)
else
if CPU_x86_32
encoder_uniq_arch_sources = $(encoder_uniq_x86_sources)
encoder_shared_arch_sources = $(encoder_shared_x86_sources)
else
encoder_uniq_arch_sources =
encoder_shared_arch_sources =
endif
endif
encoder_uniq_sources = \
analyze.c \
fdct.c \
encfrag.c \
encapiwrapper.c \
encinfo.c \
encode.c \
enquant.c \
huffenc.c \
mathops.c \
mcenc.c \
rate.c \
tokenize.c \
$(encoder_uniq_arch_sources)
encoder_sources = \
apiwrapper.c \
fragment.c \
idct.c \
internal.c \
state.c \
quant.c \
$(encoder_shared_arch_sources) \
$(encoder_uniq_sources)
endif
decoder_x86_sources = \
x86/mmxidct.c \
x86/mmxfrag.c \
x86/mmxstate.c \
x86/x86state.c
if CPU_x86_64
decoder_arch_sources = $(decoder_x86_sources)
else
if CPU_x86_32
decoder_arch_sources = $(decoder_x86_sources)
else
decoder_arch_sources =
endif
endif
decoder_sources = \
apiwrapper.c \
bitpack.c \
decapiwrapper.c \
decinfo.c \
decode.c \
dequant.c \
fragment.c \
huffdec.c \
idct.c \
info.c \
internal.c \
quant.c \
state.c \
$(decoder_arch_sources)
noinst_HEADERS = \
cpu.h \
internal.h \
encint.h \
enquant.h \
huffenc.h \
mathops.h \
modedec.h \
x86/x86enc.h \
apiwrapper.h \
bitpack.h \
dct.h \
decint.h \
dequant.h \
huffdec.h \
huffman.h \
ocintrin.h \
quant.h \
x86/mmxfrag.h \
x86/mmxloop.h \
x86/x86int.h
libtheoradec_la_SOURCES = \
$(decoder_sources) \
Version_script-dec theoradec.exp
libtheoradec_la_LDFLAGS = \
-version-info @THDEC_LIB_CURRENT@:@THDEC_LIB_REVISION@:@THDEC_LIB_AGE@ \
@THEORADEC_LDFLAGS@ @CAIRO_LIBS@
libtheoraenc_la_SOURCES = \
$(encoder_sources) \
Version_script-enc theoraenc.exp
libtheoraenc_la_LDFLAGS = \
-version-info @THENC_LIB_CURRENT@:@THENC_LIB_REVISION@:@THENC_LIB_AGE@ \
@THEORAENC_LDFLAGS@ $(OGG_LIBS)
libtheora_la_SOURCES = \
$(decoder_sources) \
$(encoder_uniq_sources) \
Version_script theora.exp
libtheora_la_LDFLAGS = \
-version-info @TH_LIB_CURRENT@:@TH_LIB_REVISION@:@TH_LIB_AGE@ \
@THEORA_LDFLAGS@ @CAIRO_LIBS@ $(OGG_LIBS)
debug:
$(MAKE) all CFLAGS="@DEBUG@"
profile:
$(MAKE) all CFLAGS="@PROFILE@"
# contstruct various symbol export list files
.def.exp : defexp.awk
awk -f defexp.awk $< > $@

View file

@ -0,0 +1,845 @@
# Makefile.in generated by automake 1.6.3 from Makefile.am.
# @configure_input@
# Copyright 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002
# Free Software Foundation, Inc.
# This Makefile.in is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.
@SET_MAKE@
SHELL = @SHELL@
srcdir = @srcdir@
top_srcdir = @top_srcdir@
VPATH = @srcdir@
prefix = @prefix@
exec_prefix = @exec_prefix@
bindir = @bindir@
sbindir = @sbindir@
libexecdir = @libexecdir@
datadir = @datadir@
sysconfdir = @sysconfdir@
sharedstatedir = @sharedstatedir@
localstatedir = @localstatedir@
libdir = @libdir@
infodir = @infodir@
mandir = @mandir@
includedir = @includedir@
oldincludedir = /usr/include
pkgdatadir = $(datadir)/@PACKAGE@
pkglibdir = $(libdir)/@PACKAGE@
pkgincludedir = $(includedir)/@PACKAGE@
top_builddir = ..
ACLOCAL = @ACLOCAL@
AUTOCONF = @AUTOCONF@
AUTOMAKE = @AUTOMAKE@
AUTOHEADER = @AUTOHEADER@
am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
INSTALL = @INSTALL@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
INSTALL_DATA = @INSTALL_DATA@
install_sh_DATA = $(install_sh) -c -m 644
install_sh_PROGRAM = $(install_sh) -c
install_sh_SCRIPT = $(install_sh) -c
INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_HEADER = $(INSTALL_DATA)
transform = @program_transform_name@
NORMAL_INSTALL = :
PRE_INSTALL = :
POST_INSTALL = :
NORMAL_UNINSTALL = :
PRE_UNINSTALL = :
POST_UNINSTALL = :
host_alias = @host_alias@
host_triplet = @host@
EXEEXT = @EXEEXT@
OBJEXT = @OBJEXT@
PATH_SEPARATOR = @PATH_SEPARATOR@
ACLOCAL_AMFLAGS = @ACLOCAL_AMFLAGS@
AMTAR = @AMTAR@
AR = @AR@
ARGZ_H = @ARGZ_H@
AS = @AS@
AWK = @AWK@
BUILDABLE_EXAMPLES = @BUILDABLE_EXAMPLES@
CAIRO_CFLAGS = @CAIRO_CFLAGS@
CAIRO_LIBS = @CAIRO_LIBS@
CC = @CC@
CPP = @CPP@
CXX = @CXX@
CXXCPP = @CXXCPP@
DEBUG = @DEBUG@
DEPDIR = @DEPDIR@
DLLTOOL = @DLLTOOL@
DSYMUTIL = @DSYMUTIL@
DUMPBIN = @DUMPBIN@
F77 = @F77@
GCJ = @GCJ@
GCJFLAGS = @GCJFLAGS@
GETOPT_OBJS = @GETOPT_OBJS@
GREP = @GREP@
HAVE_BIBTEX = @HAVE_BIBTEX@
HAVE_DOXYGEN = @HAVE_DOXYGEN@
HAVE_PDFLATEX = @HAVE_PDFLATEX@
HAVE_PKG_CONFIG = @HAVE_PKG_CONFIG@
HAVE_TRANSFIG = @HAVE_TRANSFIG@
HAVE_VALGRIND = @HAVE_VALGRIND@
INCLTDL = @INCLTDL@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LD = @LD@
LIBADD_DL = @LIBADD_DL@
LIBADD_DLD_LINK = @LIBADD_DLD_LINK@
LIBADD_DLOPEN = @LIBADD_DLOPEN@
LIBADD_SHL_LOAD = @LIBADD_SHL_LOAD@
LIBLTDL = @LIBLTDL@
LIBM = @LIBM@
LIBTOOL = @LIBTOOL@
LIPO = @LIPO@
LN_S = @LN_S@
LTDLDEPS = @LTDLDEPS@
LTDLINCL = @LTDLINCL@
LTDLOPEN = @LTDLOPEN@
LT_CONFIG_H = @LT_CONFIG_H@
LT_DLLOADERS = @LT_DLLOADERS@
LT_DLPREOPEN = @LT_DLPREOPEN@
MAINT = @MAINT@
NM = @NM@
NMEDIT = @NMEDIT@
OBJDUMP = @OBJDUMP@
OGG_CFLAGS = @OGG_CFLAGS@
OGG_LIBS = @OGG_LIBS@
OSS_LIBS = @OSS_LIBS@
OTOOL = @OTOOL@
OTOOL64 = @OTOOL64@
PACKAGE = @PACKAGE@
PKG_CONFIG = @PKG_CONFIG@
PNG_CFLAGS = @PNG_CFLAGS@
PNG_LIBS = @PNG_LIBS@
PROFILE = @PROFILE@
RANLIB = @RANLIB@
RC = @RC@
SDL_CFLAGS = @SDL_CFLAGS@
SDL_CONFIG = @SDL_CONFIG@
SDL_LIBS = @SDL_LIBS@
SED = @SED@
STRIP = @STRIP@
THDEC_LIB_AGE = @THDEC_LIB_AGE@
THDEC_LIB_CURRENT = @THDEC_LIB_CURRENT@
THDEC_LIB_REVISION = @THDEC_LIB_REVISION@
THENC_LIB_AGE = @THENC_LIB_AGE@
THENC_LIB_CURRENT = @THENC_LIB_CURRENT@
THENC_LIB_REVISION = @THENC_LIB_REVISION@
THEORADEC_LDFLAGS = @THEORADEC_LDFLAGS@
THEORAENC_LDFLAGS = @THEORAENC_LDFLAGS@
THEORA_LDFLAGS = @THEORA_LDFLAGS@
TH_LIB_AGE = @TH_LIB_AGE@
TH_LIB_CURRENT = @TH_LIB_CURRENT@
TH_LIB_REVISION = @TH_LIB_REVISION@
VALGRIND_ENVIRONMENT = @VALGRIND_ENVIRONMENT@
VERSION = @VERSION@
VORBISENC_LIBS = @VORBISENC_LIBS@
VORBISFILE_LIBS = @VORBISFILE_LIBS@
VORBIS_CFLAGS = @VORBIS_CFLAGS@
VORBIS_LIBS = @VORBIS_LIBS@
am__include = @am__include@
am__quote = @am__quote@
install_sh = @install_sh@
lt_ECHO = @lt_ECHO@
ltdl_LIBOBJS = @ltdl_LIBOBJS@
ltdl_LTLIBOBJS = @ltdl_LTLIBOBJS@
sys_symbol_underscore = @sys_symbol_underscore@
INCLUDES = -I$(top_srcdir)/include
AM_CFLAGS = $(OGG_CFLAGS) $(CAIRO_CFLAGS)
EXTRA_DIST = \
cpu.c \
encoder_disabled.c \
x86/mmxencfrag.c \
x86/mmxfdct.c \
x86/sse2fdct.c \
x86/x86enc.c \
x86/x86enc.h \
x86/mmxfrag.c \
x86/mmxfrag.h \
x86/mmxidct.c \
x86/mmxloop.h \
x86/mmxstate.c \
x86/x86int.h \
x86/x86state.c \
x86_vc
lib_LTLIBRARIES = libtheoradec.la libtheoraenc.la libtheora.la
@THEORA_DISABLE_ENCODE_TRUE@encoder_uniq_sources = \
@THEORA_DISABLE_ENCODE_TRUE@ encoder_disabled.c
@THEORA_DISABLE_ENCODE_FALSE@encoder_uniq_sources = \
@THEORA_DISABLE_ENCODE_FALSE@ analyze.c \
@THEORA_DISABLE_ENCODE_FALSE@ fdct.c \
@THEORA_DISABLE_ENCODE_FALSE@ encfrag.c \
@THEORA_DISABLE_ENCODE_FALSE@ encapiwrapper.c \
@THEORA_DISABLE_ENCODE_FALSE@ encinfo.c \
@THEORA_DISABLE_ENCODE_FALSE@ encode.c \
@THEORA_DISABLE_ENCODE_FALSE@ enquant.c \
@THEORA_DISABLE_ENCODE_FALSE@ huffenc.c \
@THEORA_DISABLE_ENCODE_FALSE@ mathops.c \
@THEORA_DISABLE_ENCODE_FALSE@ mcenc.c \
@THEORA_DISABLE_ENCODE_FALSE@ rate.c \
@THEORA_DISABLE_ENCODE_FALSE@ tokenize.c \
@THEORA_DISABLE_ENCODE_FALSE@ $(encoder_uniq_arch_sources)
@THEORA_DISABLE_ENCODE_TRUE@encoder_sources = \
@THEORA_DISABLE_ENCODE_TRUE@ $(encoder_uniq_sources)
@THEORA_DISABLE_ENCODE_FALSE@encoder_sources = \
@THEORA_DISABLE_ENCODE_FALSE@ apiwrapper.c \
@THEORA_DISABLE_ENCODE_FALSE@ fragment.c \
@THEORA_DISABLE_ENCODE_FALSE@ idct.c \
@THEORA_DISABLE_ENCODE_FALSE@ internal.c \
@THEORA_DISABLE_ENCODE_FALSE@ state.c \
@THEORA_DISABLE_ENCODE_FALSE@ quant.c \
@THEORA_DISABLE_ENCODE_FALSE@ $(encoder_shared_arch_sources) \
@THEORA_DISABLE_ENCODE_FALSE@ $(encoder_uniq_sources)
@THEORA_DISABLE_ENCODE_FALSE@encoder_uniq_x86_sources = \
@THEORA_DISABLE_ENCODE_FALSE@ x86/mmxencfrag.c \
@THEORA_DISABLE_ENCODE_FALSE@ x86/mmxfdct.c \
@THEORA_DISABLE_ENCODE_FALSE@ x86/x86enc.c
@THEORA_DISABLE_ENCODE_FALSE@encoder_uniq_x86_64_sources = \
@THEORA_DISABLE_ENCODE_FALSE@ x86/sse2fdct.c
@THEORA_DISABLE_ENCODE_FALSE@encoder_shared_x86_sources = \
@THEORA_DISABLE_ENCODE_FALSE@ x86/mmxfrag.c \
@THEORA_DISABLE_ENCODE_FALSE@ x86/mmxidct.c \
@THEORA_DISABLE_ENCODE_FALSE@ x86/mmxstate.c \
@THEORA_DISABLE_ENCODE_FALSE@ x86/x86state.c
@THEORA_DISABLE_ENCODE_FALSE@encoder_shared_x86_64_sources =
@CPU_x86_32_FALSE@@CPU_x86_64_FALSE@@THEORA_DISABLE_ENCODE_FALSE@encoder_uniq_arch_sources =
@CPU_x86_32_TRUE@@CPU_x86_64_FALSE@@THEORA_DISABLE_ENCODE_FALSE@encoder_uniq_arch_sources = $(encoder_uniq_x86_sources)
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@encoder_uniq_arch_sources = \
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@ $(encoder_uniq_x86_sources) \
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@ $(encoder_uniq_x86_64_sources)
@CPU_x86_32_FALSE@@CPU_x86_64_FALSE@@THEORA_DISABLE_ENCODE_FALSE@encoder_shared_arch_sources =
@CPU_x86_32_TRUE@@CPU_x86_64_FALSE@@THEORA_DISABLE_ENCODE_FALSE@encoder_shared_arch_sources = $(encoder_shared_x86_sources)
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@encoder_shared_arch_sources = \
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@ $(encoder_shared_x86_sources) \
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@ $(encoder_shared_x86_64_sources)
decoder_x86_sources = \
x86/mmxidct.c \
x86/mmxfrag.c \
x86/mmxstate.c \
x86/x86state.c
@CPU_x86_32_FALSE@@CPU_x86_64_FALSE@decoder_arch_sources =
@CPU_x86_32_TRUE@@CPU_x86_64_FALSE@decoder_arch_sources = $(decoder_x86_sources)
@CPU_x86_64_TRUE@decoder_arch_sources = $(decoder_x86_sources)
decoder_sources = \
apiwrapper.c \
bitpack.c \
decapiwrapper.c \
decinfo.c \
decode.c \
dequant.c \
fragment.c \
huffdec.c \
idct.c \
info.c \
internal.c \
quant.c \
state.c \
$(decoder_arch_sources)
noinst_HEADERS = \
cpu.h \
internal.h \
encint.h \
enquant.h \
huffenc.h \
mathops.h \
modedec.h \
x86/x86enc.h \
apiwrapper.h \
bitpack.h \
dct.h \
decint.h \
dequant.h \
huffdec.h \
huffman.h \
ocintrin.h \
quant.h \
x86/mmxfrag.h \
x86/mmxloop.h \
x86/x86int.h
libtheoradec_la_SOURCES = \
$(decoder_sources) \
Version_script-dec theoradec.exp
libtheoradec_la_LDFLAGS = \
-version-info @THDEC_LIB_CURRENT@:@THDEC_LIB_REVISION@:@THDEC_LIB_AGE@ \
@THEORADEC_LDFLAGS@ @CAIRO_LIBS@
libtheoraenc_la_SOURCES = \
$(encoder_sources) \
Version_script-enc theoraenc.exp
libtheoraenc_la_LDFLAGS = \
-version-info @THENC_LIB_CURRENT@:@THENC_LIB_REVISION@:@THENC_LIB_AGE@ \
@THEORAENC_LDFLAGS@ $(OGG_LIBS)
libtheora_la_SOURCES = \
$(decoder_sources) \
$(encoder_uniq_sources) \
Version_script theora.exp
libtheora_la_LDFLAGS = \
-version-info @TH_LIB_CURRENT@:@TH_LIB_REVISION@:@TH_LIB_AGE@ \
@THEORA_LDFLAGS@ @CAIRO_LIBS@ $(OGG_LIBS)
subdir = lib
mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs
CONFIG_HEADER = $(top_builddir)/config.h
CONFIG_CLEAN_FILES =
LTLIBRARIES = $(lib_LTLIBRARIES)
libtheora_la_LIBADD =
am__objects_1 = mmxidct.lo mmxfrag.lo mmxstate.lo x86state.lo
@CPU_x86_32_FALSE@@CPU_x86_64_FALSE@am__objects_2 =
@CPU_x86_32_TRUE@@CPU_x86_64_FALSE@am__objects_2 = $(am__objects_1)
@CPU_x86_64_TRUE@am__objects_2 = $(am__objects_1)
am__objects_3 = apiwrapper.lo bitpack.lo decapiwrapper.lo decinfo.lo \
decode.lo dequant.lo fragment.lo huffdec.lo idct.lo info.lo \
internal.lo quant.lo state.lo $(am__objects_2)
@THEORA_DISABLE_ENCODE_FALSE@am__objects_4 = mmxencfrag.lo mmxfdct.lo \
@THEORA_DISABLE_ENCODE_FALSE@ x86enc.lo
@THEORA_DISABLE_ENCODE_FALSE@am__objects_5 = sse2fdct.lo
@CPU_x86_32_FALSE@@CPU_x86_64_FALSE@@THEORA_DISABLE_ENCODE_FALSE@am__objects_6 =
@CPU_x86_32_TRUE@@CPU_x86_64_FALSE@@THEORA_DISABLE_ENCODE_FALSE@am__objects_6 = \
@CPU_x86_32_TRUE@@CPU_x86_64_FALSE@@THEORA_DISABLE_ENCODE_FALSE@ $(am__objects_4)
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@am__objects_6 = \
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@ $(am__objects_4) \
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@ $(am__objects_5)
@THEORA_DISABLE_ENCODE_TRUE@am__objects_7 = encoder_disabled.lo
@THEORA_DISABLE_ENCODE_FALSE@am__objects_7 = analyze.lo fdct.lo \
@THEORA_DISABLE_ENCODE_FALSE@ encfrag.lo encapiwrapper.lo \
@THEORA_DISABLE_ENCODE_FALSE@ encinfo.lo encode.lo enquant.lo \
@THEORA_DISABLE_ENCODE_FALSE@ huffenc.lo mathops.lo mcenc.lo \
@THEORA_DISABLE_ENCODE_FALSE@ rate.lo tokenize.lo \
@THEORA_DISABLE_ENCODE_FALSE@ $(am__objects_6)
am_libtheora_la_OBJECTS = $(am__objects_3) $(am__objects_7)
libtheora_la_OBJECTS = $(am_libtheora_la_OBJECTS)
libtheoradec_la_LIBADD =
am_libtheoradec_la_OBJECTS = $(am__objects_3)
libtheoradec_la_OBJECTS = $(am_libtheoradec_la_OBJECTS)
libtheoraenc_la_LIBADD =
@THEORA_DISABLE_ENCODE_FALSE@am__objects_8 = mmxfrag.lo mmxidct.lo \
@THEORA_DISABLE_ENCODE_FALSE@ mmxstate.lo x86state.lo
@THEORA_DISABLE_ENCODE_FALSE@am__objects_9 =
@CPU_x86_32_FALSE@@CPU_x86_64_FALSE@@THEORA_DISABLE_ENCODE_FALSE@am__objects_10 =
@CPU_x86_32_TRUE@@CPU_x86_64_FALSE@@THEORA_DISABLE_ENCODE_FALSE@am__objects_10 = \
@CPU_x86_32_TRUE@@CPU_x86_64_FALSE@@THEORA_DISABLE_ENCODE_FALSE@ $(am__objects_8)
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@am__objects_10 = \
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@ $(am__objects_8) \
@CPU_x86_64_TRUE@@THEORA_DISABLE_ENCODE_FALSE@ $(am__objects_9)
@THEORA_DISABLE_ENCODE_TRUE@am__objects_11 = $(am__objects_7)
@THEORA_DISABLE_ENCODE_FALSE@am__objects_11 = apiwrapper.lo fragment.lo \
@THEORA_DISABLE_ENCODE_FALSE@ idct.lo internal.lo state.lo \
@THEORA_DISABLE_ENCODE_FALSE@ quant.lo $(am__objects_10) \
@THEORA_DISABLE_ENCODE_FALSE@ $(am__objects_7)
am_libtheoraenc_la_OBJECTS = $(am__objects_11)
libtheoraenc_la_OBJECTS = $(am_libtheoraenc_la_OBJECTS)
DEFS = @DEFS@
DEFAULT_INCLUDES = -I. -I$(srcdir) -I$(top_builddir)
CPPFLAGS = @CPPFLAGS@
LDFLAGS = @LDFLAGS@
LIBS = @LIBS@
depcomp = $(SHELL) $(top_srcdir)/depcomp
am__depfiles_maybe = depfiles
@AMDEP_TRUE@DEP_FILES = ./$(DEPDIR)/analyze.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/apiwrapper.Plo ./$(DEPDIR)/bitpack.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/decapiwrapper.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/decinfo.Plo ./$(DEPDIR)/decode.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/dequant.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/encapiwrapper.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/encfrag.Plo ./$(DEPDIR)/encinfo.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/encode.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/encoder_disabled.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/enquant.Plo ./$(DEPDIR)/fdct.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/fragment.Plo ./$(DEPDIR)/huffdec.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/huffenc.Plo ./$(DEPDIR)/idct.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/info.Plo ./$(DEPDIR)/internal.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/mathops.Plo ./$(DEPDIR)/mcenc.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/mmxencfrag.Plo ./$(DEPDIR)/mmxfdct.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/mmxfrag.Plo ./$(DEPDIR)/mmxidct.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/mmxstate.Plo ./$(DEPDIR)/quant.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/rate.Plo ./$(DEPDIR)/sse2fdct.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/state.Plo ./$(DEPDIR)/tokenize.Plo \
@AMDEP_TRUE@ ./$(DEPDIR)/x86enc.Plo ./$(DEPDIR)/x86state.Plo
COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \
$(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
LTCOMPILE = $(LIBTOOL) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) \
$(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
CCLD = $(CC)
LINK = $(LIBTOOL) --mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
$(AM_LDFLAGS) $(LDFLAGS) -o $@
CFLAGS = @CFLAGS@
DIST_SOURCES = $(libtheora_la_SOURCES) $(libtheoradec_la_SOURCES) \
$(libtheoraenc_la_SOURCES)
HEADERS = $(noinst_HEADERS)
DIST_COMMON = $(noinst_HEADERS) Makefile.am Makefile.in
SOURCES = $(libtheora_la_SOURCES) $(libtheoradec_la_SOURCES) $(libtheoraenc_la_SOURCES)
all: all-am
.SUFFIXES:
.SUFFIXES: .c .def .exp .lo .o .obj
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ Makefile.am $(top_srcdir)/configure.ac $(ACLOCAL_M4)
cd $(top_srcdir) && \
$(AUTOMAKE) --gnu lib/Makefile
Makefile: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.in $(top_builddir)/config.status
cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)
libLTLIBRARIES_INSTALL = $(INSTALL)
install-libLTLIBRARIES: $(lib_LTLIBRARIES)
@$(NORMAL_INSTALL)
$(mkinstalldirs) $(DESTDIR)$(libdir)
@list='$(lib_LTLIBRARIES)'; for p in $$list; do \
if test -f $$p; then \
f="`echo $$p | sed -e 's|^.*/||'`"; \
echo " $(LIBTOOL) --mode=install $(libLTLIBRARIES_INSTALL) $(INSTALL_STRIP_FLAG) $$p $(DESTDIR)$(libdir)/$$f"; \
$(LIBTOOL) --mode=install $(libLTLIBRARIES_INSTALL) $(INSTALL_STRIP_FLAG) $$p $(DESTDIR)$(libdir)/$$f; \
else :; fi; \
done
uninstall-libLTLIBRARIES:
@$(NORMAL_UNINSTALL)
@list='$(lib_LTLIBRARIES)'; for p in $$list; do \
p="`echo $$p | sed -e 's|^.*/||'`"; \
echo " $(LIBTOOL) --mode=uninstall rm -f $(DESTDIR)$(libdir)/$$p"; \
$(LIBTOOL) --mode=uninstall rm -f $(DESTDIR)$(libdir)/$$p; \
done
clean-libLTLIBRARIES:
-test -z "$(lib_LTLIBRARIES)" || rm -f $(lib_LTLIBRARIES)
@list='$(lib_LTLIBRARIES)'; for p in $$list; do \
dir="`echo $$p | sed -e 's|/[^/]*$$||'`"; \
test -z "$dir" && dir=.; \
echo "rm -f \"$${dir}/so_locations\""; \
rm -f "$${dir}/so_locations"; \
done
mmxidct.lo: x86/mmxidct.c
mmxfrag.lo: x86/mmxfrag.c
mmxstate.lo: x86/mmxstate.c
x86state.lo: x86/x86state.c
mmxencfrag.lo: x86/mmxencfrag.c
mmxfdct.lo: x86/mmxfdct.c
x86enc.lo: x86/x86enc.c
sse2fdct.lo: x86/sse2fdct.c
libtheora.la: $(libtheora_la_OBJECTS) $(libtheora_la_DEPENDENCIES)
$(LINK) -rpath $(libdir) $(libtheora_la_LDFLAGS) $(libtheora_la_OBJECTS) $(libtheora_la_LIBADD) $(LIBS)
libtheoradec.la: $(libtheoradec_la_OBJECTS) $(libtheoradec_la_DEPENDENCIES)
$(LINK) -rpath $(libdir) $(libtheoradec_la_LDFLAGS) $(libtheoradec_la_OBJECTS) $(libtheoradec_la_LIBADD) $(LIBS)
libtheoraenc.la: $(libtheoraenc_la_OBJECTS) $(libtheoraenc_la_DEPENDENCIES)
$(LINK) -rpath $(libdir) $(libtheoraenc_la_LDFLAGS) $(libtheoraenc_la_OBJECTS) $(libtheoraenc_la_LIBADD) $(LIBS)
mostlyclean-compile:
-rm -f *.$(OBJEXT) core *.core
distclean-compile:
-rm -f *.tab.c
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/analyze.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/apiwrapper.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/bitpack.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/decapiwrapper.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/decinfo.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/decode.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/dequant.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/encapiwrapper.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/encfrag.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/encinfo.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/encode.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/encoder_disabled.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/enquant.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fdct.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fragment.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/huffdec.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/huffenc.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/idct.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/info.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/internal.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mathops.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mcenc.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mmxencfrag.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mmxfdct.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mmxfrag.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mmxidct.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mmxstate.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/quant.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/rate.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sse2fdct.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/state.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tokenize.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/x86enc.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/x86state.Plo@am__quote@
distclean-depend:
-rm -rf ./$(DEPDIR)
.c.o:
@AMDEP_TRUE@ source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/$*.Po' tmpdepfile='$(DEPDIR)/$*.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(COMPILE) -c `test -f '$<' || echo '$(srcdir)/'`$<
.c.obj:
@AMDEP_TRUE@ source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/$*.Po' tmpdepfile='$(DEPDIR)/$*.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(COMPILE) -c `cygpath -w $<`
.c.lo:
@AMDEP_TRUE@ source='$<' object='$@' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/$*.Plo' tmpdepfile='$(DEPDIR)/$*.TPlo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(LTCOMPILE) -c -o $@ `test -f '$<' || echo '$(srcdir)/'`$<
mmxidct.o: x86/mmxidct.c
@AMDEP_TRUE@ source='x86/mmxidct.c' object='mmxidct.o' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxidct.Po' tmpdepfile='$(DEPDIR)/mmxidct.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxidct.o `test -f 'x86/mmxidct.c' || echo '$(srcdir)/'`x86/mmxidct.c
mmxidct.obj: x86/mmxidct.c
@AMDEP_TRUE@ source='x86/mmxidct.c' object='mmxidct.obj' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxidct.Po' tmpdepfile='$(DEPDIR)/mmxidct.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxidct.obj `cygpath -w x86/mmxidct.c`
mmxidct.lo: x86/mmxidct.c
@AMDEP_TRUE@ source='x86/mmxidct.c' object='mmxidct.lo' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxidct.Plo' tmpdepfile='$(DEPDIR)/mmxidct.TPlo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(LIBTOOL) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxidct.lo `test -f 'x86/mmxidct.c' || echo '$(srcdir)/'`x86/mmxidct.c
mmxfrag.o: x86/mmxfrag.c
@AMDEP_TRUE@ source='x86/mmxfrag.c' object='mmxfrag.o' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxfrag.Po' tmpdepfile='$(DEPDIR)/mmxfrag.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxfrag.o `test -f 'x86/mmxfrag.c' || echo '$(srcdir)/'`x86/mmxfrag.c
mmxfrag.obj: x86/mmxfrag.c
@AMDEP_TRUE@ source='x86/mmxfrag.c' object='mmxfrag.obj' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxfrag.Po' tmpdepfile='$(DEPDIR)/mmxfrag.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxfrag.obj `cygpath -w x86/mmxfrag.c`
mmxfrag.lo: x86/mmxfrag.c
@AMDEP_TRUE@ source='x86/mmxfrag.c' object='mmxfrag.lo' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxfrag.Plo' tmpdepfile='$(DEPDIR)/mmxfrag.TPlo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(LIBTOOL) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxfrag.lo `test -f 'x86/mmxfrag.c' || echo '$(srcdir)/'`x86/mmxfrag.c
mmxstate.o: x86/mmxstate.c
@AMDEP_TRUE@ source='x86/mmxstate.c' object='mmxstate.o' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxstate.Po' tmpdepfile='$(DEPDIR)/mmxstate.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxstate.o `test -f 'x86/mmxstate.c' || echo '$(srcdir)/'`x86/mmxstate.c
mmxstate.obj: x86/mmxstate.c
@AMDEP_TRUE@ source='x86/mmxstate.c' object='mmxstate.obj' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxstate.Po' tmpdepfile='$(DEPDIR)/mmxstate.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxstate.obj `cygpath -w x86/mmxstate.c`
mmxstate.lo: x86/mmxstate.c
@AMDEP_TRUE@ source='x86/mmxstate.c' object='mmxstate.lo' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxstate.Plo' tmpdepfile='$(DEPDIR)/mmxstate.TPlo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(LIBTOOL) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxstate.lo `test -f 'x86/mmxstate.c' || echo '$(srcdir)/'`x86/mmxstate.c
x86state.o: x86/x86state.c
@AMDEP_TRUE@ source='x86/x86state.c' object='x86state.o' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/x86state.Po' tmpdepfile='$(DEPDIR)/x86state.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o x86state.o `test -f 'x86/x86state.c' || echo '$(srcdir)/'`x86/x86state.c
x86state.obj: x86/x86state.c
@AMDEP_TRUE@ source='x86/x86state.c' object='x86state.obj' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/x86state.Po' tmpdepfile='$(DEPDIR)/x86state.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o x86state.obj `cygpath -w x86/x86state.c`
x86state.lo: x86/x86state.c
@AMDEP_TRUE@ source='x86/x86state.c' object='x86state.lo' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/x86state.Plo' tmpdepfile='$(DEPDIR)/x86state.TPlo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(LIBTOOL) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o x86state.lo `test -f 'x86/x86state.c' || echo '$(srcdir)/'`x86/x86state.c
mmxencfrag.o: x86/mmxencfrag.c
@AMDEP_TRUE@ source='x86/mmxencfrag.c' object='mmxencfrag.o' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxencfrag.Po' tmpdepfile='$(DEPDIR)/mmxencfrag.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxencfrag.o `test -f 'x86/mmxencfrag.c' || echo '$(srcdir)/'`x86/mmxencfrag.c
mmxencfrag.obj: x86/mmxencfrag.c
@AMDEP_TRUE@ source='x86/mmxencfrag.c' object='mmxencfrag.obj' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxencfrag.Po' tmpdepfile='$(DEPDIR)/mmxencfrag.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxencfrag.obj `cygpath -w x86/mmxencfrag.c`
mmxencfrag.lo: x86/mmxencfrag.c
@AMDEP_TRUE@ source='x86/mmxencfrag.c' object='mmxencfrag.lo' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxencfrag.Plo' tmpdepfile='$(DEPDIR)/mmxencfrag.TPlo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(LIBTOOL) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxencfrag.lo `test -f 'x86/mmxencfrag.c' || echo '$(srcdir)/'`x86/mmxencfrag.c
mmxfdct.o: x86/mmxfdct.c
@AMDEP_TRUE@ source='x86/mmxfdct.c' object='mmxfdct.o' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxfdct.Po' tmpdepfile='$(DEPDIR)/mmxfdct.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxfdct.o `test -f 'x86/mmxfdct.c' || echo '$(srcdir)/'`x86/mmxfdct.c
mmxfdct.obj: x86/mmxfdct.c
@AMDEP_TRUE@ source='x86/mmxfdct.c' object='mmxfdct.obj' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxfdct.Po' tmpdepfile='$(DEPDIR)/mmxfdct.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxfdct.obj `cygpath -w x86/mmxfdct.c`
mmxfdct.lo: x86/mmxfdct.c
@AMDEP_TRUE@ source='x86/mmxfdct.c' object='mmxfdct.lo' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/mmxfdct.Plo' tmpdepfile='$(DEPDIR)/mmxfdct.TPlo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(LIBTOOL) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o mmxfdct.lo `test -f 'x86/mmxfdct.c' || echo '$(srcdir)/'`x86/mmxfdct.c
x86enc.o: x86/x86enc.c
@AMDEP_TRUE@ source='x86/x86enc.c' object='x86enc.o' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/x86enc.Po' tmpdepfile='$(DEPDIR)/x86enc.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o x86enc.o `test -f 'x86/x86enc.c' || echo '$(srcdir)/'`x86/x86enc.c
x86enc.obj: x86/x86enc.c
@AMDEP_TRUE@ source='x86/x86enc.c' object='x86enc.obj' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/x86enc.Po' tmpdepfile='$(DEPDIR)/x86enc.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o x86enc.obj `cygpath -w x86/x86enc.c`
x86enc.lo: x86/x86enc.c
@AMDEP_TRUE@ source='x86/x86enc.c' object='x86enc.lo' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/x86enc.Plo' tmpdepfile='$(DEPDIR)/x86enc.TPlo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(LIBTOOL) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o x86enc.lo `test -f 'x86/x86enc.c' || echo '$(srcdir)/'`x86/x86enc.c
sse2fdct.o: x86/sse2fdct.c
@AMDEP_TRUE@ source='x86/sse2fdct.c' object='sse2fdct.o' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/sse2fdct.Po' tmpdepfile='$(DEPDIR)/sse2fdct.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o sse2fdct.o `test -f 'x86/sse2fdct.c' || echo '$(srcdir)/'`x86/sse2fdct.c
sse2fdct.obj: x86/sse2fdct.c
@AMDEP_TRUE@ source='x86/sse2fdct.c' object='sse2fdct.obj' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/sse2fdct.Po' tmpdepfile='$(DEPDIR)/sse2fdct.TPo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o sse2fdct.obj `cygpath -w x86/sse2fdct.c`
sse2fdct.lo: x86/sse2fdct.c
@AMDEP_TRUE@ source='x86/sse2fdct.c' object='sse2fdct.lo' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@ depfile='$(DEPDIR)/sse2fdct.Plo' tmpdepfile='$(DEPDIR)/sse2fdct.TPlo' @AMDEPBACKSLASH@
@AMDEP_TRUE@ $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
$(LIBTOOL) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o sse2fdct.lo `test -f 'x86/sse2fdct.c' || echo '$(srcdir)/'`x86/sse2fdct.c
CCDEPMODE = @CCDEPMODE@
mostlyclean-libtool:
-rm -f *.lo
clean-libtool:
-rm -rf .libs _libs
distclean-libtool:
-rm -f libtool
uninstall-info-am:
ETAGS = etags
ETAGSFLAGS =
tags: TAGS
ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES)
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
mkid -fID $$unique
TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \
$(TAGS_FILES) $(LISP)
tags=; \
here=`pwd`; \
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
test -z "$(ETAGS_ARGS)$$tags$$unique" \
|| $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \
$$tags $$unique
GTAGS:
here=`$(am__cd) $(top_builddir) && pwd` \
&& cd $(top_srcdir) \
&& gtags -i $(GTAGS_ARGS) $$here
distclean-tags:
-rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH
DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
top_distdir = ..
distdir = $(top_distdir)/$(PACKAGE)-$(VERSION)
distdir: $(DISTFILES)
$(mkinstalldirs) $(distdir)/x86
@list='$(DISTFILES)'; for file in $$list; do \
if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \
if test "$$dir" != "$$file" && test "$$dir" != "."; then \
dir="/$$dir"; \
$(mkinstalldirs) "$(distdir)$$dir"; \
else \
dir=''; \
fi; \
if test -d $$d/$$file; then \
if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \
fi; \
cp -pR $$d/$$file $(distdir)$$dir || exit 1; \
else \
test -f $(distdir)/$$file \
|| cp -p $$d/$$file $(distdir)/$$file \
|| exit 1; \
fi; \
done
check-am: all-am
check: check-am
all-am: Makefile $(LTLIBRARIES) $(HEADERS)
installdirs:
$(mkinstalldirs) $(DESTDIR)$(libdir)
install: install-am
install-exec: install-exec-am
install-data: install-data-am
uninstall: uninstall-am
install-am: all-am
@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am
installcheck: installcheck-am
install-strip:
$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
INSTALL_STRIP_FLAG=-s \
`test -z '$(STRIP)' || \
echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
mostlyclean-generic:
clean-generic:
distclean-generic:
-rm -f Makefile $(CONFIG_CLEAN_FILES)
maintainer-clean-generic:
@echo "This command is intended for maintainers to use"
@echo "it deletes files that may require special tools to rebuild."
clean: clean-am
clean-am: clean-generic clean-libLTLIBRARIES clean-libtool \
mostlyclean-am
distclean: distclean-am
distclean-am: clean-am distclean-compile distclean-depend \
distclean-generic distclean-libtool distclean-tags
dvi: dvi-am
dvi-am:
info: info-am
info-am:
install-data-am:
install-exec-am: install-libLTLIBRARIES
install-info: install-info-am
install-man:
installcheck-am:
maintainer-clean: maintainer-clean-am
maintainer-clean-am: distclean-am maintainer-clean-generic
mostlyclean: mostlyclean-am
mostlyclean-am: mostlyclean-compile mostlyclean-generic \
mostlyclean-libtool
uninstall-am: uninstall-info-am uninstall-libLTLIBRARIES
.PHONY: GTAGS all all-am check check-am clean clean-generic \
clean-libLTLIBRARIES clean-libtool distclean distclean-compile \
distclean-depend distclean-generic distclean-libtool \
distclean-tags distdir dvi dvi-am info info-am install \
install-am install-data install-data-am install-exec \
install-exec-am install-info install-info-am \
install-libLTLIBRARIES install-man install-strip installcheck \
installcheck-am installdirs maintainer-clean \
maintainer-clean-generic mostlyclean mostlyclean-compile \
mostlyclean-generic mostlyclean-libtool tags uninstall \
uninstall-am uninstall-info-am uninstall-libLTLIBRARIES
debug:
$(MAKE) all CFLAGS="@DEBUG@"
profile:
$(MAKE) all CFLAGS="@PROFILE@"
# contstruct various symbol export list files
.def.exp : defexp.awk
awk -f defexp.awk $< > $@
# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:

View file

@ -0,0 +1,53 @@
#
# Export file for libtheora
#
# Only the symbols listed in the global section will be callable from
# applications linking to the libraries.
#
# We use something that looks like a versioned so filename here
# to define the old API because of a historical confusion. This
# label must be kept to maintain ABI compatibility.
libtheora.so.1.0
{
global:
theora_version_string;
theora_version_number;
theora_encode_init;
theora_encode_YUVin;
theora_encode_packetout;
theora_encode_header;
theora_encode_comment;
theora_encode_tables;
theora_decode_header;
theora_decode_init;
theora_decode_packetin;
theora_decode_YUVout;
theora_control;
theora_packet_isheader;
theora_packet_iskeyframe;
theora_granule_shift;
theora_granule_frame;
theora_granule_time;
theora_info_init;
theora_info_clear;
theora_clear;
theora_comment_init;
theora_comment_add;
theora_comment_add_tag;
theora_comment_query;
theora_comment_query_count;
theora_comment_clear;
local:
*;
};

View file

@ -0,0 +1,82 @@
#
# Export file for libtheoradec
#
# Only the symbols listed in the global section will be callable from
# applications linking to the libraries.
#
# The 1.x API
libtheoradec_1.0
{
global:
th_version_string;
th_version_number;
th_decode_headerin;
th_decode_alloc;
th_setup_free;
th_decode_ctl;
th_decode_packetin;
th_decode_ycbcr_out;
th_decode_free;
th_packet_isheader;
th_packet_iskeyframe;
th_granule_frame;
th_granule_time;
th_info_init;
th_info_clear;
th_comment_init;
th_comment_add;
th_comment_add_tag;
th_comment_query;
th_comment_query_count;
th_comment_clear;
local:
*;
};
# The deprecated legacy api from the libtheora alpha releases.
# We use something that looks like a versioned so filename here
# to define the old API because of a historical confusion. This
# label must be kept to maintain ABI compatibility.
libtheora.so.1.0
{
global:
theora_version_string;
theora_version_number;
theora_decode_header;
theora_decode_init;
theora_decode_packetin;
theora_decode_YUVout;
theora_control;
theora_packet_isheader;
theora_packet_iskeyframe;
theora_granule_shift;
theora_granule_frame;
theora_granule_time;
theora_info_init;
theora_info_clear;
theora_clear;
theora_comment_init;
theora_comment_add;
theora_comment_add_tag;
theora_comment_query;
theora_comment_query_count;
theora_comment_clear;
local:
*;
};

View file

@ -0,0 +1,43 @@
#
# Export file for libtheora
#
# Only the symbols listed in the global section will be callable from
# applications linking to the libraries.
#
# The 1.x encoder API
libtheoraenc_1.0
{
global:
th_encode_alloc;
th_encode_ctl;
th_encode_flushheader;
th_encode_ycbcr_in;
th_encode_packetout;
th_encode_free;
TH_VP31_QUANT_INFO;
TH_VP31_HUFF_CODES;
local:
*;
};
# The encoder portion of the deprecated alpha release api.
# We use something that looks like a versioned so filename here
# to define the old API because of a historical confusion. This
# label must be kept to maintain ABI compatibility.
libtheora.so.1.0
{
global:
theora_encode_init;
theora_encode_YUVin;
theora_encode_packetout;
theora_encode_header;
theora_encode_comment;
theora_encode_tables;
local:
*;
};

File diff suppressed because it is too large Load diff

View file

@ -5,13 +5,13 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: apiwrapper.c 15400 2008-10-15 12:10:58Z tterribe $
last mod: $Id: apiwrapper.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
@ -47,10 +47,10 @@ void theora_info_clear(theora_info *_ci){
void theora_clear(theora_state *_th){
/*Provide compatibility with mixed encoder and decoder shared lib versions.*/
if(_th->internal_decode!=NULL){
(*((oc_state_dispatch_vtbl *)_th->internal_decode)->clear)(_th);
(*((oc_state_dispatch_vtable *)_th->internal_decode)->clear)(_th);
}
if(_th->internal_encode!=NULL){
(*((oc_state_dispatch_vtbl *)_th->internal_encode)->clear)(_th);
(*((oc_state_dispatch_vtable *)_th->internal_encode)->clear)(_th);
}
if(_th->i!=NULL)theora_info_clear(_th->i);
memset(_th,0,sizeof(*_th));
@ -59,11 +59,11 @@ void theora_clear(theora_state *_th){
int theora_control(theora_state *_th,int _req,void *_buf,size_t _buf_sz){
/*Provide compatibility with mixed encoder and decoder shared lib versions.*/
if(_th->internal_decode!=NULL){
return (*((oc_state_dispatch_vtbl *)_th->internal_decode)->control)(_th,
return (*((oc_state_dispatch_vtable *)_th->internal_decode)->control)(_th,
_req,_buf,_buf_sz);
}
else if(_th->internal_encode!=NULL){
return (*((oc_state_dispatch_vtbl *)_th->internal_encode)->control)(_th,
return (*((oc_state_dispatch_vtable *)_th->internal_encode)->control)(_th,
_req,_buf,_buf_sz);
}
else return TH_EINVAL;
@ -72,11 +72,11 @@ int theora_control(theora_state *_th,int _req,void *_buf,size_t _buf_sz){
ogg_int64_t theora_granule_frame(theora_state *_th,ogg_int64_t _gp){
/*Provide compatibility with mixed encoder and decoder shared lib versions.*/
if(_th->internal_decode!=NULL){
return (*((oc_state_dispatch_vtbl *)_th->internal_decode)->granule_frame)(
return (*((oc_state_dispatch_vtable *)_th->internal_decode)->granule_frame)(
_th,_gp);
}
else if(_th->internal_encode!=NULL){
return (*((oc_state_dispatch_vtbl *)_th->internal_encode)->granule_frame)(
return (*((oc_state_dispatch_vtable *)_th->internal_encode)->granule_frame)(
_th,_gp);
}
else return -1;
@ -85,11 +85,11 @@ ogg_int64_t theora_granule_frame(theora_state *_th,ogg_int64_t _gp){
double theora_granule_time(theora_state *_th, ogg_int64_t _gp){
/*Provide compatibility with mixed encoder and decoder shared lib versions.*/
if(_th->internal_decode!=NULL){
return (*((oc_state_dispatch_vtbl *)_th->internal_decode)->granule_time)(
return (*((oc_state_dispatch_vtable *)_th->internal_decode)->granule_time)(
_th,_gp);
}
else if(_th->internal_encode!=NULL){
return (*((oc_state_dispatch_vtbl *)_th->internal_encode)->granule_time)(
return (*((oc_state_dispatch_vtable *)_th->internal_encode)->granule_time)(
_th,_gp);
}
else return -1;

View file

@ -5,7 +5,7 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
@ -20,9 +20,8 @@
# include <ogg/ogg.h>
# include <theora/theora.h>
# include "theora/theoradec.h"
/*# include "theora/theoraenc.h"*/
typedef struct th_enc_ctx th_enc_ctx;
# include "../internal.h"
# include "theora/theoraenc.h"
# include "internal.h"
typedef struct th_api_wrapper th_api_wrapper;
typedef struct th_api_info th_api_info;

View file

@ -0,0 +1,111 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE OggTheora SOURCE CODE IS (C) COPYRIGHT 1994-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function: packing variable sized words into an octet stream
last mod: $Id: bitpack.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
#include <string.h>
#include <stdlib.h>
#include "bitpack.h"
/*We're 'MSb' endian; if we write a word but read individual bits,
then we'll read the MSb first.*/
void oc_pack_readinit(oc_pack_buf *_b,unsigned char *_buf,long _bytes){
memset(_b,0,sizeof(*_b));
_b->ptr=_buf;
_b->stop=_buf+_bytes;
}
static oc_pb_window oc_pack_refill(oc_pack_buf *_b,int _bits){
const unsigned char *ptr;
const unsigned char *stop;
oc_pb_window window;
int available;
window=_b->window;
available=_b->bits;
ptr=_b->ptr;
stop=_b->stop;
while(available<=OC_PB_WINDOW_SIZE-8&&ptr<stop){
available+=8;
window|=(oc_pb_window)*ptr++<<OC_PB_WINDOW_SIZE-available;
}
_b->ptr=ptr;
if(_bits>available){
if(ptr>=stop){
_b->eof=1;
available=OC_LOTS_OF_BITS;
}
else window|=*ptr>>(available&7);
}
_b->bits=available;
return window;
}
int oc_pack_look1(oc_pack_buf *_b){
oc_pb_window window;
int available;
window=_b->window;
available=_b->bits;
if(available<1)_b->window=window=oc_pack_refill(_b,1);
return window>>OC_PB_WINDOW_SIZE-1;
}
void oc_pack_adv1(oc_pack_buf *_b){
_b->window<<=1;
_b->bits--;
}
/*Here we assume that 0<=_bits&&_bits<=32.*/
long oc_pack_read(oc_pack_buf *_b,int _bits){
oc_pb_window window;
int available;
long result;
window=_b->window;
available=_b->bits;
if(_bits==0)return 0;
if(available<_bits){
window=oc_pack_refill(_b,_bits);
available=_b->bits;
}
result=window>>OC_PB_WINDOW_SIZE-_bits;
available-=_bits;
window<<=1;
window<<=_bits-1;
_b->bits=available;
_b->window=window;
return result;
}
int oc_pack_read1(oc_pack_buf *_b){
oc_pb_window window;
int available;
int result;
window=_b->window;
available=_b->bits;
if(available<1){
window=oc_pack_refill(_b,1);
available=_b->bits;
}
result=window>>OC_PB_WINDOW_SIZE-1;
available--;
window<<=1;
_b->bits=available;
_b->window=window;
return result;
}
long oc_pack_bytes_left(oc_pack_buf *_b){
if(_b->eof)return -1;
return _b->stop-_b->ptr+(_b->bits>>3);
}

View file

@ -5,7 +5,7 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE OggTheora SOURCE CODE IS (C) COPYRIGHT 1994-2008 *
* THE OggTheora SOURCE CODE IS (C) COPYRIGHT 1994-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
@ -16,23 +16,44 @@
********************************************************************/
#if !defined(_bitpack_H)
# define _bitpack_H (1)
# include <ogg/ogg.h>
# include <limits.h>
void theorapackB_readinit(oggpack_buffer *_b,unsigned char *_buf,int _bytes);
int theorapackB_look1(oggpack_buffer *_b,long *_ret);
void theorapackB_adv1(oggpack_buffer *_b);
typedef unsigned long oc_pb_window;
typedef struct oc_pack_buf oc_pack_buf;
# define OC_PB_WINDOW_SIZE ((int)sizeof(oc_pb_window)*CHAR_BIT)
/*This is meant to be a large, positive constant that can still be efficiently
loaded as an immediate (on platforms like ARM, for example).
Even relatively modest values like 100 would work fine.*/
# define OC_LOTS_OF_BITS (0x40000000)
struct oc_pack_buf{
oc_pb_window window;
const unsigned char *ptr;
const unsigned char *stop;
int bits;
int eof;
};
void oc_pack_readinit(oc_pack_buf *_b,unsigned char *_buf,long _bytes);
int oc_pack_look1(oc_pack_buf *_b);
void oc_pack_adv1(oc_pack_buf *_b);
/*Here we assume 0<=_bits&&_bits<=32.*/
int theorapackB_read(oggpack_buffer *_b,int _bits,long *_ret);
int theorapackB_read1(oggpack_buffer *_b,long *_ret);
long theorapackB_bytes(oggpack_buffer *_b);
long theorapackB_bits(oggpack_buffer *_b);
unsigned char *theorapackB_get_buffer(oggpack_buffer *_b);
long oc_pack_read(oc_pack_buf *_b,int _bits);
int oc_pack_read1(oc_pack_buf *_b);
/* returns -1 for read beyond EOF, or the number of whole bytes available */
long oc_pack_bytes_left(oc_pack_buf *_b);
/*These two functions are implemented locally in huffdec.c*/
/*Read in bits without advancing the bitptr.
Here we assume 0<=_bits&&_bits<=32.*/
/*static int theorapackB_look(oggpack_buffer *_b,int _bits,long *_ret);*/
/*static void theorapackB_adv(oggpack_buffer *_b,int _bits);*/
/*static int oc_pack_look(oc_pack_buf *_b,int _bits);*/
/*static void oc_pack_adv(oc_pack_buf *_b,int _bits);*/
#endif

View file

@ -5,7 +5,7 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2008 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
@ -14,13 +14,13 @@
Originally written by Rudolf Marek.
function:
last mod: $Id: cpu.c 15427 2008-10-21 02:36:19Z xiphmont $
last mod: $Id: cpu.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
#include "cpu.h"
#if !defined(USE_ASM)
#if !defined(OC_X86_ASM)
static ogg_uint32_t oc_cpu_flags_get(void){
return 0;
}
@ -166,7 +166,7 @@ static ogg_uint32_t oc_cpu_flags_get(void){
/* D M A c i t n e h t u A*/
else if(ecx==0x444D4163&&edx==0x69746E65&&ebx==0x68747541||
/* C S N y b e d o e G*/
ecx==0x43534E20&&edx==0x79622065&&ebx==0x646F6547){
ecx==0x43534e20&&edx==0x79622065&&ebx==0x646f6547){
/*AMD, Geode:*/
cpuid(0x80000000,eax,ebx,ecx,edx);
if(eax<0x80000001)flags=0;
@ -192,7 +192,6 @@ static ogg_uint32_t oc_cpu_flags_get(void){
The C3-2 (Nehemiah) cores appear to, as well.*/
cpuid(1,eax,ebx,ecx,edx);
flags=oc_parse_intel_flags(edx,ecx);
cpuid(0x80000000,eax,ebx,ecx,edx);
if(eax>=0x80000001){
/*The (non-Nehemiah) C3 processors support AMD-like cpuid info.
We need to check this even if the Intel test succeeds to pick up 3DNow!

View file

@ -5,12 +5,12 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: cpu.h 15430 2008-10-21 05:03:55Z giles $
last mod: $Id: cpu.h 16503 2009-08-22 18:14:02Z giles $
********************************************************************/

View file

@ -5,13 +5,13 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dct.h 15400 2008-10-15 12:10:58Z tterribe $
last mod: $Id: dct.h 16503 2009-08-22 18:14:02Z giles $
********************************************************************/

View file

@ -1,121 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE OggTheora SOURCE CODE IS (C) COPYRIGHT 1994-2008 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function: packing variable sized words into an octet stream
last mod: $Id: bitpack.c 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
/*We're 'MSb' endian; if we write a word but read individual bits,
then we'll read the MSb first.*/
#include <string.h>
#include <stdlib.h>
#include "bitpack.h"
void theorapackB_readinit(oggpack_buffer *_b,unsigned char *_buf,int _bytes){
memset(_b,0,sizeof(*_b));
_b->buffer=_b->ptr=_buf;
_b->storage=_bytes;
}
int theorapackB_look1(oggpack_buffer *_b,long *_ret){
if(_b->endbyte>=_b->storage){
*_ret=0L;
return -1;
}
*_ret=(_b->ptr[0]>>7-_b->endbit)&1;
return 0;
}
void theorapackB_adv1(oggpack_buffer *_b){
if(++(_b->endbit)>7){
_b->endbit=0;
_b->ptr++;
_b->endbyte++;
}
}
/*Here we assume that 0<=_bits&&_bits<=32.*/
int theorapackB_read(oggpack_buffer *_b,int _bits,long *_ret){
long ret;
long m;
long d;
int fail;
m=32-_bits;
_bits+=_b->endbit;
d=_b->storage-_b->endbyte;
if(d<=4){
/*Not the main path.*/
if(d*8<_bits){
*_ret=0L;
fail=-1;
goto overflow;
}
/*Special case to avoid reading _b->ptr[0], which might be past the end of
the buffer; also skips some useless accounting.*/
else if(!_bits){
*_ret=0L;
return 0;
}
}
ret=_b->ptr[0]<<24+_b->endbit;
if(_bits>8){
ret|=_b->ptr[1]<<16+_b->endbit;
if(_bits>16){
ret|=_b->ptr[2]<<8+_b->endbit;
if(_bits>24){
ret|=_b->ptr[3]<<_b->endbit;
if(_bits>32)ret|=_b->ptr[4]>>8-_b->endbit;
}
}
}
*_ret=((ret&0xFFFFFFFFUL)>>(m>>1))>>(m+1>>1);
fail=0;
overflow:
_b->ptr+=_bits>>3;
_b->endbyte+=_bits>>3;
_b->endbit=_bits&7;
return fail;
}
int theorapackB_read1(oggpack_buffer *_b,long *_ret){
int fail;
if(_b->endbyte>=_b->storage){
/*Not the main path.*/
*_ret=0L;
fail=-1;
}
else{
*_ret=(_b->ptr[0]>>7-_b->endbit)&1;
fail=0;
}
_b->endbit++;
if(_b->endbit>7){
_b->endbit=0;
_b->ptr++;
_b->endbyte++;
}
return fail;
}
long theorapackB_bytes(oggpack_buffer *_b){
return _b->endbyte+(_b->endbit+7>>3);
}
long theorapackB_bits(oggpack_buffer *_b){
return _b->endbyte*8+_b->endbit;
}
unsigned char *theorapackB_get_buffer(oggpack_buffer *_b){
return _b->buffer;
}

File diff suppressed because it is too large Load diff

View file

@ -1,199 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: fragment.c 15469 2008-10-30 12:49:42Z tterribe $
********************************************************************/
#include "../internal.h"
void oc_frag_recon_intra(const oc_theora_state *_state,unsigned char *_dst,
int _dst_ystride,const ogg_int16_t *_residue){
_state->opt_vtable.frag_recon_intra(_dst,_dst_ystride,_residue);
}
void oc_frag_recon_intra_c(unsigned char *_dst,int _dst_ystride,
const ogg_int16_t *_residue){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++){
int res;
res=*_residue++;
_dst[j]=OC_CLAMP255(res+128);
}
_dst+=_dst_ystride;
}
}
void oc_frag_recon_inter(const oc_theora_state *_state,unsigned char *_dst,
int _dst_ystride,const unsigned char *_src,int _src_ystride,
const ogg_int16_t *_residue){
_state->opt_vtable.frag_recon_inter(_dst,_dst_ystride,_src,_src_ystride,
_residue);
}
void oc_frag_recon_inter_c(unsigned char *_dst,int _dst_ystride,
const unsigned char *_src,int _src_ystride,const ogg_int16_t *_residue){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++){
int res;
res=*_residue++;
_dst[j]=OC_CLAMP255(res+_src[j]);
}
_dst+=_dst_ystride;
_src+=_src_ystride;
}
}
void oc_frag_recon_inter2(const oc_theora_state *_state,unsigned char *_dst,
int _dst_ystride,const unsigned char *_src1,int _src1_ystride,
const unsigned char *_src2,int _src2_ystride,const ogg_int16_t *_residue){
_state->opt_vtable.frag_recon_inter2(_dst,_dst_ystride,_src1,_src1_ystride,
_src2,_src2_ystride,_residue);
}
void oc_frag_recon_inter2_c(unsigned char *_dst,int _dst_ystride,
const unsigned char *_src1,int _src1_ystride,const unsigned char *_src2,
int _src2_ystride,const ogg_int16_t *_residue){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++){
int res;
res=*_residue++;
_dst[j]=OC_CLAMP255(res+((int)_src1[j]+_src2[j]>>1));
}
_dst+=_dst_ystride;
_src1+=_src1_ystride;
_src2+=_src2_ystride;
}
}
/*Computes the predicted DC value for the given fragment.
This requires that the fully decoded DC values be available for the left,
upper-left, upper, and upper-right fragments (if they exist).
_frag: The fragment to predict the DC value for.
_fplane: The fragment plane the fragment belongs to.
_x: The x-coordinate of the fragment.
_y: The y-coordinate of the fragment.
_pred_last: The last fully-decoded DC value for each predictor frame
(OC_FRAME_GOLD, OC_FRAME_PREV and OC_FRAME_SELF).
This should be initialized to 0's for the first fragment in each
color plane.
Return: The predicted DC value for this fragment.*/
int oc_frag_pred_dc(const oc_fragment *_frag,
const oc_fragment_plane *_fplane,int _x,int _y,int _pred_last[3]){
static const int PRED_SCALE[16][4]={
/*0*/
{0,0,0,0},
/*OC_PL*/
{1,0,0,0},
/*OC_PUL*/
{1,0,0,0},
/*OC_PL|OC_PUL*/
{1,0,0,0},
/*OC_PU*/
{1,0,0,0},
/*OC_PL|OC_PU*/
{1,1,0,0},
/*OC_PUL|OC_PU*/
{0,1,0,0},
/*OC_PL|OC_PUL|PC_PU*/
{29,-26,29,0},
/*OC_PUR*/
{1,0,0,0},
/*OC_PL|OC_PUR*/
{75,53,0,0},
/*OC_PUL|OC_PUR*/
{1,1,0,0},
/*OC_PL|OC_PUL|OC_PUR*/
{75,0,53,0},
/*OC_PU|OC_PUR*/
{1,0,0,0},
/*OC_PL|OC_PU|OC_PUR*/
{75,0,53,0},
/*OC_PUL|OC_PU|OC_PUR*/
{3,10,3,0},
/*OC_PL|OC_PUL|OC_PU|OC_PUR*/
{29,-26,29,0}
};
static const int PRED_SHIFT[16]={0,0,0,0,0,1,0,5,0,7,1,7,0,7,4,5};
static const int PRED_RMASK[16]={0,0,0,0,0,1,0,31,0,127,1,127,0,127,15,31};
static const int BC_MASK[8]={
/*No boundary condition.*/
OC_PL|OC_PUL|OC_PU|OC_PUR,
/*Left column.*/
OC_PU|OC_PUR,
/*Top row.*/
OC_PL,
/*Top row, left column.*/
0,
/*Right column.*/
OC_PL|OC_PUL|OC_PU,
/*Right and left column.*/
OC_PU,
/*Top row, right column.*/
OC_PL,
/*Top row, right and left column.*/
0
};
/*Predictor fragments, left, up-left, up, up-right.*/
const oc_fragment *predfr[4];
/*The frame used for prediction for this fragment.*/
int pred_frame;
/*The boundary condition flags.*/
int bc;
/*DC predictor values: left, up-left, up, up-right, missing values
skipped.*/
int p[4];
/*Predictor count.*/
int np;
/*Which predictor constants to use.*/
int pflags;
/*The predicted DC value.*/
int ret;
int i;
pred_frame=OC_FRAME_FOR_MODE[_frag->mbmode];
bc=(_x==0)+((_y==0)<<1)+((_x+1==_fplane->nhfrags)<<2);
predfr[0]=_frag-1;
predfr[1]=_frag-_fplane->nhfrags-1;
predfr[2]=predfr[1]+1;
predfr[3]=predfr[2]+1;
np=0;
pflags=0;
for(i=0;i<4;i++){
int pflag;
pflag=1<<i;
if((BC_MASK[bc]&pflag)&&predfr[i]->coded&&
OC_FRAME_FOR_MODE[predfr[i]->mbmode]==pred_frame){
p[np++]=predfr[i]->dc;
pflags|=pflag;
}
}
if(pflags==0)return _pred_last[pred_frame];
else{
ret=PRED_SCALE[pflags][0]*p[0];
/*LOOP VECTORIZES.*/
for(i=1;i<np;i++)ret+=PRED_SCALE[pflags][i]*p[i];
ret=OC_DIV_POW2(ret,PRED_SHIFT[pflags],PRED_RMASK[pflags]);
}
if((pflags&(OC_PL|OC_PUL|OC_PU))==(OC_PL|OC_PUL|OC_PU)){
if(abs(ret-p[2])>128)ret=p[2];
else if(abs(ret-p[0])>128)ret=p[0];
else if(abs(ret-p[1])>128)ret=p[1];
}
return ret;
}

View file

@ -1,325 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: huffdec.c 15431 2008-10-21 05:04:02Z giles $
********************************************************************/
#include <stdlib.h>
#include <ogg/ogg.h>
#include "huffdec.h"
#include "decint.h"
/*The ANSI offsetof macro is broken on some platforms (e.g., older DECs).*/
#define _ogg_offsetof(_type,_field)\
((size_t)((char *)&((_type *)0)->_field-(char *)0))
/*These two functions are really part of the bitpack.c module, but
they are only used here. Declaring local static versions so they
can be inlined saves considerable function call overhead.*/
/*Read in bits without advancing the bitptr.
Here we assume 0<=_bits&&_bits<=32.*/
static int theorapackB_look(oggpack_buffer *_b,int _bits,long *_ret){
long ret;
long m;
long d;
m=32-_bits;
_bits+=_b->endbit;
d=_b->storage-_b->endbyte;
if(d<=4){
/*Not the main path.*/
if(d<=0){
*_ret=0L;
return -(_bits>d*8);
}
/*If we have some bits left, but not enough, return the ones we have.*/
if(d*8<_bits)_bits=d*8;
}
ret=_b->ptr[0]<<24+_b->endbit;
if(_bits>8){
ret|=_b->ptr[1]<<16+_b->endbit;
if(_bits>16){
ret|=_b->ptr[2]<<8+_b->endbit;
if(_bits>24){
ret|=_b->ptr[3]<<_b->endbit;
if(_bits>32)ret|=_b->ptr[4]>>8-_b->endbit;
}
}
}
*_ret=((ret&0xFFFFFFFF)>>(m>>1))>>(m+1>>1);
return 0;
}
/*advance the bitptr*/
static void theorapackB_adv(oggpack_buffer *_b,int _bits){
_bits+=_b->endbit;
_b->ptr+=_bits>>3;
_b->endbyte+=_bits>>3;
_b->endbit=_bits&7;
}
/*The log_2 of the size of a lookup table is allowed to grow to relative to
the number of unique nodes it contains.
E.g., if OC_HUFF_SLUSH is 2, then at most 75% of the space in the tree is
wasted (each node will have an amortized cost of at most 20 bytes when using
4-byte pointers).
Larger numbers can decode tokens with fewer read operations, while smaller
numbers may save more space (requiring as little as 8 bytes amortized per
node, though there will be more nodes).
With a sample file:
32233473 read calls are required when no tree collapsing is done (100.0%).
19269269 read calls are required when OC_HUFF_SLUSH is 0 (59.8%).
11144969 read calls are required when OC_HUFF_SLUSH is 1 (34.6%).
10538563 read calls are required when OC_HUFF_SLUSH is 2 (32.7%).
10192578 read calls are required when OC_HUFF_SLUSH is 3 (31.6%).
Since a value of 1 gets us the vast majority of the speed-up with only a
small amount of wasted memory, this is what we use.*/
#define OC_HUFF_SLUSH (1)
/*Allocates a Huffman tree node that represents a subtree of depth _nbits.
_nbits: The depth of the subtree.
If this is 0, the node is a leaf node.
Otherwise 1<<_nbits pointers are allocated for children.
Return: The newly allocated and fully initialized Huffman tree node.*/
static oc_huff_node *oc_huff_node_alloc(int _nbits){
oc_huff_node *ret;
size_t size;
size=_ogg_offsetof(oc_huff_node,nodes);
if(_nbits>0)size+=sizeof(oc_huff_node *)*(1<<_nbits);
ret=_ogg_calloc(1,size);
ret->nbits=(unsigned char)_nbits;
return ret;
}
/*Frees a Huffman tree node allocated with oc_huf_node_alloc.
_node: The node to free.
This may be NULL.*/
static void oc_huff_node_free(oc_huff_node *_node){
_ogg_free(_node);
}
/*Frees the memory used by a Huffman tree.
_node: The Huffman tree to free.
This may be NULL.*/
static void oc_huff_tree_free(oc_huff_node *_node){
if(_node==NULL)return;
if(_node->nbits){
int nchildren;
int i;
int inext;
nchildren=1<<_node->nbits;
for(i=0;i<nchildren;i=inext){
inext=i+(_node->nodes[i]!=NULL?1<<_node->nbits-_node->nodes[i]->depth:1);
oc_huff_tree_free(_node->nodes[i]);
}
}
oc_huff_node_free(_node);
}
/*Unpacks a sub-tree from the given buffer.
_opb: The buffer to unpack from.
_binode: The location to store a pointer to the sub-tree in.
_depth: The current depth of the tree.
This is used to prevent infinite recursion.
Return: 0 on success, or a negative value on error.*/
static int oc_huff_tree_unpack(oggpack_buffer *_opb,
oc_huff_node **_binode,int _depth){
oc_huff_node *binode;
long bits;
/*Prevent infinite recursion.*/
if(++_depth>32)return TH_EBADHEADER;
if(theorapackB_read1(_opb,&bits)<0)return TH_EBADHEADER;
/*Read an internal node:*/
if(!bits){
int ret;
binode=oc_huff_node_alloc(1);
binode->depth=(unsigned char)(_depth>1);
ret=oc_huff_tree_unpack(_opb,binode->nodes,_depth);
if(ret>=0)ret=oc_huff_tree_unpack(_opb,binode->nodes+1,_depth);
if(ret<0){
oc_huff_tree_free(binode);
*_binode=NULL;
return ret;
}
}
/*Read a leaf node:*/
else{
if(theorapackB_read(_opb,OC_NDCT_TOKEN_BITS,&bits)<0)return TH_EBADHEADER;
binode=oc_huff_node_alloc(0);
binode->depth=(unsigned char)(_depth>1);
binode->token=(unsigned char)bits;
}
*_binode=binode;
return 0;
}
/*Finds the depth of shortest branch of the given sub-tree.
The tree must be binary.
_binode: The root of the given sub-tree.
_binode->nbits must be 0 or 1.
Return: The smallest depth of a leaf node in this sub-tree.
0 indicates this sub-tree is a leaf node.*/
static int oc_huff_tree_mindepth(oc_huff_node *_binode){
int depth0;
int depth1;
if(_binode->nbits==0)return 0;
depth0=oc_huff_tree_mindepth(_binode->nodes[0]);
depth1=oc_huff_tree_mindepth(_binode->nodes[1]);
return OC_MINI(depth0,depth1)+1;
}
/*Finds the number of internal nodes at a given depth, plus the number of
leaves at that depth or shallower.
The tree must be binary.
_binode: The root of the given sub-tree.
_binode->nbits must be 0 or 1.
Return: The number of entries that would be contained in a jump table of the
given depth.*/
static int oc_huff_tree_occupancy(oc_huff_node *_binode,int _depth){
if(_binode->nbits==0||_depth<=0)return 1;
else{
return oc_huff_tree_occupancy(_binode->nodes[0],_depth-1)+
oc_huff_tree_occupancy(_binode->nodes[1],_depth-1);
}
}
static oc_huff_node *oc_huff_tree_collapse(oc_huff_node *_binode);
/*Fills the given nodes table with all the children in the sub-tree at the
given depth.
The nodes in the sub-tree with a depth less than that stored in the table
are freed.
The sub-tree must be binary and complete up until the given depth.
_nodes: The nodes table to fill.
_binode: The root of the sub-tree to fill it with.
_binode->nbits must be 0 or 1.
_level: The current level in the table.
0 indicates that the current node should be stored, regardless of
whether it is a leaf node or an internal node.
_depth: The depth of the nodes to fill the table with, relative to their
parent.*/
static void oc_huff_node_fill(oc_huff_node **_nodes,
oc_huff_node *_binode,int _level,int _depth){
if(_level<=0||_binode->nbits==0){
int i;
_binode->depth=(unsigned char)(_depth-_level);
_nodes[0]=oc_huff_tree_collapse(_binode);
for(i=1;i<1<<_level;i++)_nodes[i]=_nodes[0];
}
else{
_level--;
oc_huff_node_fill(_nodes,_binode->nodes[0],_level,_depth);
oc_huff_node_fill(_nodes+(1<<_level),_binode->nodes[1],_level,_depth);
oc_huff_node_free(_binode);
}
}
/*Finds the largest complete sub-tree rooted at the current node and collapses
it into a single node.
This procedure is then applied recursively to all the children of that node.
_binode: The root of the sub-tree to collapse.
_binode->nbits must be 0 or 1.
Return: The new root of the collapsed sub-tree.*/
static oc_huff_node *oc_huff_tree_collapse(oc_huff_node *_binode){
oc_huff_node *root;
int mindepth;
int depth;
int loccupancy;
int occupancy;
depth=mindepth=oc_huff_tree_mindepth(_binode);
occupancy=1<<mindepth;
do{
loccupancy=occupancy;
occupancy=oc_huff_tree_occupancy(_binode,++depth);
}
while(occupancy>loccupancy&&occupancy>=1<<OC_MAXI(depth-OC_HUFF_SLUSH,0));
depth--;
if(depth<=1)return _binode;
root=oc_huff_node_alloc(depth);
root->depth=_binode->depth;
oc_huff_node_fill(root->nodes,_binode,depth,depth);
return root;
}
/*Makes a copy of the given Huffman tree.
_node: The Huffman tree to copy.
Return: The copy of the Huffman tree.*/
static oc_huff_node *oc_huff_tree_copy(const oc_huff_node *_node){
oc_huff_node *ret;
ret=oc_huff_node_alloc(_node->nbits);
ret->depth=_node->depth;
if(_node->nbits){
int nchildren;
int i;
int inext;
nchildren=1<<_node->nbits;
for(i=0;i<nchildren;){
ret->nodes[i]=oc_huff_tree_copy(_node->nodes[i]);
inext=i+(1<<_node->nbits-ret->nodes[i]->depth);
while(++i<inext)ret->nodes[i]=ret->nodes[i-1];
}
}
else ret->token=_node->token;
return ret;
}
/*Unpacks a set of Huffman trees, and reduces them to a collapsed
representation.
_opb: The buffer to unpack the trees from.
_nodes: The table to fill with the Huffman trees.
Return: 0 on success, or a negative value on error.*/
int oc_huff_trees_unpack(oggpack_buffer *_opb,
oc_huff_node *_nodes[TH_NHUFFMAN_TABLES]){
int i;
for(i=0;i<TH_NHUFFMAN_TABLES;i++){
int ret;
ret=oc_huff_tree_unpack(_opb,_nodes+i,0);
if(ret<0)return ret;
_nodes[i]=oc_huff_tree_collapse(_nodes[i]);
}
return 0;
}
/*Makes a copy of the given set of Huffman trees.
_dst: The array to store the copy in.
_src: The array of trees to copy.*/
void oc_huff_trees_copy(oc_huff_node *_dst[TH_NHUFFMAN_TABLES],
const oc_huff_node *const _src[TH_NHUFFMAN_TABLES]){
int i;
for(i=0;i<TH_NHUFFMAN_TABLES;i++)_dst[i]=oc_huff_tree_copy(_src[i]);
}
/*Frees the memory used by a set of Huffman trees.
_nodes: The array of trees to free.*/
void oc_huff_trees_clear(oc_huff_node *_nodes[TH_NHUFFMAN_TABLES]){
int i;
for(i=0;i<TH_NHUFFMAN_TABLES;i++)oc_huff_tree_free(_nodes[i]);
}
/*Unpacks a single token using the given Huffman tree.
_opb: The buffer to unpack the token from.
_node: The tree to unpack the token with.
Return: The token value.*/
int oc_huff_token_decode(oggpack_buffer *_opb,const oc_huff_node *_node){
long bits;
while(_node->nbits!=0){
theorapackB_look(_opb,_node->nbits,&bits);
_node=_node->nodes[bits];
theorapackB_adv(_opb,_node->depth);
}
return _node->token;
}

View file

@ -1,26 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: idct.h 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
/*Inverse DCT transforms.*/
#include <ogg/ogg.h>
#if !defined(_idct_H)
# define _idct_H (1)
void oc_idct8x8_c(ogg_int16_t _y[64],const ogg_int16_t _x[64]);
void oc_idct8x8_10_c(ogg_int16_t _y[64],const ogg_int16_t _x[64]);
#endif

View file

@ -1,88 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: ocintrin.h 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
/*Some common macros for potential platform-specific optimization.*/
#include <math.h>
#if !defined(_ocintrin_H)
# define _ocintrin_H (1)
/*Some specific platforms may have optimized intrinsic or inline assembly
versions of these functions which can substantially improve performance.
We define macros for them to allow easy incorporation of these non-ANSI
features.*/
/*Branchless, but not correct for differences larger than INT_MAX.
static int oc_mini(int _a,int _b){
int ambsign;
ambsign=_a-_b>>sizeof(int)*8-1;
return (_a&~ambsign)+(_b&ambsign);
}*/
#define OC_MAXI(_a,_b) ((_a)<(_b)?(_b):(_a))
#define OC_MINI(_a,_b) ((_a)>(_b)?(_b):(_a))
/*Clamps an integer into the given range.
If _a>_c, then the lower bound _a is respected over the upper bound _c (this
behavior is required to meet our documented API behavior).
_a: The lower bound.
_b: The value to clamp.
_c: The upper boud.*/
#define OC_CLAMPI(_a,_b,_c) (OC_MAXI(_a,OC_MINI(_b,_c)))
#define OC_CLAMP255(_x) ((unsigned char)((((_x)<0)-1)&((_x)|-((_x)>255))))
/*Divides an integer by a power of two, truncating towards 0.
_dividend: The integer to divide.
_shift: The non-negative power of two to divide by.
_rmask: (1<<_shift)-1*/
#define OC_DIV_POW2(_dividend,_shift,_rmask)\
((_dividend)+(((_dividend)>>sizeof(_dividend)*8-1)&(_rmask))>>(_shift))
/*Divides _x by 65536, truncating towards 0.*/
#define OC_DIV2_16(_x) OC_DIV_POW2(_x,16,0xFFFF)
/*Divides _x by 2, truncating towards 0.*/
#define OC_DIV2(_x) OC_DIV_POW2(_x,1,0x1)
/*Divides _x by 8, truncating towards 0.*/
#define OC_DIV8(_x) OC_DIV_POW2(_x,3,0x7)
/*Divides _x by 16, truncating towards 0.*/
#define OC_DIV16(_x) OC_DIV_POW2(_x,4,0xF)
/*Right shifts _dividend by _shift, adding _rval, and subtracting one for
negative dividends first..
When _rval is (1<<_shift-1), this is equivalent to division with rounding
ties towards positive infinity.*/
#define OC_DIV_ROUND_POW2(_dividend,_shift,_rval)\
((_dividend)+((_dividend)>>sizeof(_dividend)*8-1)+(_rval)>>(_shift))
/*Swaps two integers _a and _b if _a>_b.*/
#define OC_SORT2I(_a,_b)\
if((_a)>(_b)){\
int t__;\
t__=(_a);\
(_a)=(_b);\
(_b)=t__;\
}
/*All of these macros should expect floats as arguments.*/
#define OC_MAXF(_a,_b) ((_a)<(_b)?(_b):(_a))
#define OC_MINF(_a,_b) ((_a)>(_b)?(_b):(_a))
#define OC_CLAMPF(_a,_b,_c) (OC_MINF(_a,OC_MAXF(_b,_c)))
#define OC_FABSF(_f) ((float)fabs(_f))
#define OC_SQRTF(_f) ((float)sqrt(_f))
#define OC_POWF(_b,_e) ((float)pow(_b,_e))
#define OC_LOGF(_f) ((float)log(_f))
#define OC_IFLOORF(_f) ((int)floor(_f))
#define OC_ICEILF(_f) ((int)ceil(_f))
#endif

View file

@ -1,122 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: quant.c 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include <ogg/ogg.h>
#include "quant.h"
#include "decint.h"
static const unsigned OC_DC_QUANT_MIN[2]={4<<2,8<<2};
static const unsigned OC_AC_QUANT_MIN[2]={2<<2,4<<2};
/*Initializes the dequantization tables from a set of quantizer info.
Currently the dequantizer (and elsewhere enquantizer) tables are expected to
be initialized as pointing to the storage reserved for them in the
oc_theora_state (resp. oc_enc_ctx) structure.
If some tables are duplicates of others, the pointers will be adjusted to
point to a single copy of the tables, but the storage for them will not be
freed.
If you're concerned about the memory footprint, the obvious thing to do is
to move the storage out of its fixed place in the structures and allocate
it on demand.
However, a much, much better option is to only store the quantization
matrices being used for the current frame, and to recalculate these as the
qi values change between frames (this is what VP3 did).*/
void oc_dequant_tables_init(oc_quant_table *_dequant[2][3],
int _pp_dc_scale[64],const th_quant_info *_qinfo){
/*coding mode: intra or inter.*/
int qti;
/*Y', C_b, C_r*/
int pli;
for(qti=0;qti<2;qti++){
for(pli=0;pli<3;pli++){
oc_quant_tables stage;
/*Quality index.*/
int qi;
/*Range iterator.*/
int qri;
for(qi=0,qri=0; qri<=_qinfo->qi_ranges[qti][pli].nranges; qri++){
th_quant_base base;
ogg_uint32_t q;
int qi_start;
int qi_end;
int ci;
memcpy(base,_qinfo->qi_ranges[qti][pli].base_matrices[qri],
sizeof(base));
qi_start=qi;
if(qri==_qinfo->qi_ranges[qti][pli].nranges)qi_end=qi+1;
else qi_end=qi+_qinfo->qi_ranges[qti][pli].sizes[qri];
/*Iterate over quality indicies in this range.*/
for(;;){
ogg_uint32_t qfac;
/*In the original VP3.2 code, the rounding offset and the size of the
dead zone around 0 were controlled by a "sharpness" parameter.
The size of our dead zone is now controlled by the per-coefficient
quality thresholds returned by our HVS module.
We round down from a more accurate value when the quality of the
reconstruction does not fall below our threshold and it saves bits.
Hence, all of that VP3.2 code is gone from here, and the remaining
floating point code has been implemented as equivalent integer code
with exact precision.*/
qfac=(ogg_uint32_t)_qinfo->dc_scale[qi]*base[0];
/*For postprocessing, not dequantization.*/
if(_pp_dc_scale!=NULL)_pp_dc_scale[qi]=(int)(qfac/160);
/*Scale DC the coefficient from the proper table.*/
q=(qfac/100)<<2;
q=OC_CLAMPI(OC_DC_QUANT_MIN[qti],q,OC_QUANT_MAX);
stage[qi][0]=(ogg_uint16_t)q;
/*Now scale AC coefficients from the proper table.*/
for(ci=1;ci<64;ci++){
q=((ogg_uint32_t)_qinfo->ac_scale[qi]*base[ci]/100)<<2;
q=OC_CLAMPI(OC_AC_QUANT_MIN[qti],q,OC_QUANT_MAX);
stage[qi][ci]=(ogg_uint16_t)q;
}
if(++qi>=qi_end)break;
/*Interpolate the next base matrix.*/
for(ci=0;ci<64;ci++){
base[ci]=(unsigned char)(
(2*((qi_end-qi)*_qinfo->qi_ranges[qti][pli].base_matrices[qri][ci]+
(qi-qi_start)*_qinfo->qi_ranges[qti][pli].base_matrices[qri+1][ci])
+_qinfo->qi_ranges[qti][pli].sizes[qri])/
(2*_qinfo->qi_ranges[qti][pli].sizes[qri]));
}
}
}
/*Staging matrices complete; commit to memory only if this isn't a
duplicate of a preceeding plane.
This simple check helps us improve cache coherency later.*/
{
int dupe;
int qtj;
int plj;
dupe=0;
for(qtj=0;qtj<=qti;qtj++){
for(plj=0;plj<(qtj<qti?3:pli);plj++){
if(!memcmp(stage,_dequant[qtj][plj],sizeof(stage))){
dupe=1;
break;
}
}
if(dupe)break;
}
if(dupe)_dequant[qti][pli]=_dequant[qtj][plj];
else memcpy(_dequant[qti][pli],stage,sizeof(stage));
}
}
}
}

View file

@ -1,653 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: mmxstate.c 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
/*MMX acceleration of complete fragment reconstruction algorithm.
Originally written by Rudolf Marek.*/
#include "x86int.h"
#include "../../internal.h"
#include <stddef.h>
#if defined(USE_ASM)
static const __attribute__((aligned(8),used)) int OC_FZIG_ZAGMMX[64]={
0, 8, 1, 2, 9,16,24,17,
10, 3,32,11,18,25, 4,12,
5,26,19,40,33,34,41,48,
27, 6,13,20,28,21,14, 7,
56,49,42,35,43,50,57,36,
15,22,29,30,23,44,37,58,
51,59,38,45,52,31,60,53,
46,39,47,54,61,62,55,63
};
void oc_state_frag_recon_mmx(oc_theora_state *_state,oc_fragment *_frag,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,int _ncoefs,
ogg_uint16_t _dc_iquant,const ogg_uint16_t _ac_iquant[64]){
ogg_int16_t __attribute__((aligned(8))) res_buf[64];
int dst_framei;
int dst_ystride;
int zzi;
/*_last_zzi is subtly different from an actual count of the number of
coefficients we decoded for this block.
It contains the value of zzi BEFORE the final token in the block was
decoded.
In most cases this is an EOB token (the continuation of an EOB run from a
previous block counts), and so this is the same as the coefficient count.
However, in the case that the last token was NOT an EOB token, but filled
the block up with exactly 64 coefficients, _last_zzi will be less than 64.
Provided the last token was not a pure zero run, the minimum value it can
be is 46, and so that doesn't affect any of the cases in this routine.
However, if the last token WAS a pure zero run of length 63, then _last_zzi
will be 1 while the number of coefficients decoded is 64.
Thus, we will trigger the following special case, where the real
coefficient count would not.
Note also that a zero run of length 64 will give _last_zzi a value of 0,
but we still process the DC coefficient, which might have a non-zero value
due to DC prediction.
Although convoluted, this is arguably the correct behavior: it allows us to
dequantize fewer coefficients and use a smaller transform when the block
ends with a long zero run instead of a normal EOB token.
It could be smarter... multiple separate zero runs at the end of a block
will fool it, but an encoder that generates these really deserves what it
gets.
Needless to say we inherited this approach from VP3.*/
/*Special case only having a DC component.*/
if(_last_zzi<2){
ogg_uint16_t p;
/*Why is the iquant product rounded in this case and no others?
Who knows.*/
p=(ogg_int16_t)((ogg_int32_t)_frag->dc*_dc_iquant+15>>5);
/*Fill res_buf with p.*/
__asm__ __volatile__(
/*mm0=0000 0000 0000 AAAA*/
"movd %[p],%%mm0\n\t"
/*mm1=0000 0000 0000 AAAA*/
"movd %[p],%%mm1\n\t"
/*mm0=0000 0000 AAAA 0000*/
"pslld $16,%%mm0\n\t"
/*mm0=0000 0000 AAAA AAAA*/
"por %%mm1,%%mm0\n\t"
/*mm0=AAAA AAAA AAAA AAAA*/
"punpcklwd %%mm0,%%mm0\n\t"
"movq %%mm0,(%[res_buf])\n\t"
"movq %%mm0,8(%[res_buf])\n\t"
"movq %%mm0,16(%[res_buf])\n\t"
"movq %%mm0,24(%[res_buf])\n\t"
"movq %%mm0,32(%[res_buf])\n\t"
"movq %%mm0,40(%[res_buf])\n\t"
"movq %%mm0,48(%[res_buf])\n\t"
"movq %%mm0,56(%[res_buf])\n\t"
"movq %%mm0,64(%[res_buf])\n\t"
"movq %%mm0,72(%[res_buf])\n\t"
"movq %%mm0,80(%[res_buf])\n\t"
"movq %%mm0,88(%[res_buf])\n\t"
"movq %%mm0,96(%[res_buf])\n\t"
"movq %%mm0,104(%[res_buf])\n\t"
"movq %%mm0,112(%[res_buf])\n\t"
"movq %%mm0,120(%[res_buf])\n\t"
:
:[res_buf]"r"(res_buf),[p]"r"((unsigned)p)
:"memory"
);
}
else{
/*Then, fill in the remainder of the coefficients with 0's, and perform
the iDCT.*/
/*First zero the buffer.*/
/*On K7, etc., this could be replaced with movntq and sfence.*/
__asm__ __volatile__(
"pxor %%mm0,%%mm0\n\t"
"movq %%mm0,(%[res_buf])\n\t"
"movq %%mm0,8(%[res_buf])\n\t"
"movq %%mm0,16(%[res_buf])\n\t"
"movq %%mm0,24(%[res_buf])\n\t"
"movq %%mm0,32(%[res_buf])\n\t"
"movq %%mm0,40(%[res_buf])\n\t"
"movq %%mm0,48(%[res_buf])\n\t"
"movq %%mm0,56(%[res_buf])\n\t"
"movq %%mm0,64(%[res_buf])\n\t"
"movq %%mm0,72(%[res_buf])\n\t"
"movq %%mm0,80(%[res_buf])\n\t"
"movq %%mm0,88(%[res_buf])\n\t"
"movq %%mm0,96(%[res_buf])\n\t"
"movq %%mm0,104(%[res_buf])\n\t"
"movq %%mm0,112(%[res_buf])\n\t"
"movq %%mm0,120(%[res_buf])\n\t"
:
:[res_buf]"r"(res_buf)
:"memory"
);
res_buf[0]=(ogg_int16_t)((ogg_int32_t)_frag->dc*_dc_iquant);
/*This is planned to be rewritten in MMX.*/
for(zzi=1;zzi<_ncoefs;zzi++){
int ci;
ci=OC_FZIG_ZAG[zzi];
res_buf[OC_FZIG_ZAGMMX[zzi]]=(ogg_int16_t)((ogg_int32_t)_dct_coeffs[zzi]*
_ac_iquant[ci]);
}
if(_last_zzi<10)oc_idct8x8_10_mmx(res_buf);
else oc_idct8x8_mmx(res_buf);
}
/*Fill in the target buffer.*/
dst_framei=_state->ref_frame_idx[OC_FRAME_SELF];
dst_ystride=_state->ref_frame_bufs[dst_framei][_pli].stride;
/*For now ystride values in all ref frames assumed to be equal.*/
if(_frag->mbmode==OC_MODE_INTRA){
oc_frag_recon_intra_mmx(_frag->buffer[dst_framei],dst_ystride,res_buf);
}
else{
int ref_framei;
int ref_ystride;
int mvoffsets[2];
ref_framei=_state->ref_frame_idx[OC_FRAME_FOR_MODE[_frag->mbmode]];
ref_ystride=_state->ref_frame_bufs[ref_framei][_pli].stride;
if(oc_state_get_mv_offsets(_state,mvoffsets,_frag->mv[0],_frag->mv[1],
ref_ystride,_pli)>1){
oc_frag_recon_inter2_mmx(_frag->buffer[dst_framei],dst_ystride,
_frag->buffer[ref_framei]+mvoffsets[0],ref_ystride,
_frag->buffer[ref_framei]+mvoffsets[1],ref_ystride,res_buf);
}
else{
oc_frag_recon_inter_mmx(_frag->buffer[dst_framei],dst_ystride,
_frag->buffer[ref_framei]+mvoffsets[0],ref_ystride,res_buf);
}
}
oc_restore_fpu(_state);
}
/*Copies the fragments specified by the lists of fragment indices from one
frame to another.
_fragis: A pointer to a list of fragment indices.
_nfragis: The number of fragment indices to copy.
_dst_frame: The reference frame to copy to.
_src_frame: The reference frame to copy from.
_pli: The color plane the fragments lie in.*/
void oc_state_frag_copy_mmx(const oc_theora_state *_state,const int *_fragis,
int _nfragis,int _dst_frame,int _src_frame,int _pli){
const int *fragi;
const int *fragi_end;
int dst_framei;
ptrdiff_t dst_ystride;
int src_framei;
ptrdiff_t src_ystride;
dst_framei=_state->ref_frame_idx[_dst_frame];
src_framei=_state->ref_frame_idx[_src_frame];
dst_ystride=_state->ref_frame_bufs[dst_framei][_pli].stride;
src_ystride=_state->ref_frame_bufs[src_framei][_pli].stride;
fragi_end=_fragis+_nfragis;
for(fragi=_fragis;fragi<fragi_end;fragi++){
oc_fragment *frag;
unsigned char *dst;
unsigned char *src;
ptrdiff_t s;
frag=_state->frags+*fragi;
dst=frag->buffer[dst_framei];
src=frag->buffer[src_framei];
__asm__ __volatile__(
/*src+0*src_ystride*/
"movq (%[src]),%%mm0\n\t"
/*s=src_ystride*3*/
"lea (%[src_ystride],%[src_ystride],2),%[s]\n\t"
/*src+1*src_ystride*/
"movq (%[src],%[src_ystride]),%%mm1\n\t"
/*src+2*src_ystride*/
"movq (%[src],%[src_ystride],2),%%mm2\n\t"
/*src+3*src_ystride*/
"movq (%[src],%[s]),%%mm3\n\t"
/*dst+0*dst_ystride*/
"movq %%mm0,(%[dst])\n\t"
/*s=dst_ystride*3*/
"lea (%[dst_ystride],%[dst_ystride],2),%[s]\n\t"
/*dst+1*dst_ystride*/
"movq %%mm1,(%[dst],%[dst_ystride])\n\t"
/*Pointer to next 4.*/
"lea (%[src],%[src_ystride],4),%[src]\n\t"
/*dst+2*dst_ystride*/
"movq %%mm2,(%[dst],%[dst_ystride],2)\n\t"
/*dst+3*dst_ystride*/
"movq %%mm3,(%[dst],%[s])\n\t"
/*Pointer to next 4.*/
"lea (%[dst],%[dst_ystride],4),%[dst]\n\t"
/*src+0*src_ystride*/
"movq (%[src]),%%mm0\n\t"
/*s=src_ystride*3*/
"lea (%[src_ystride],%[src_ystride],2),%[s]\n\t"
/*src+1*src_ystride*/
"movq (%[src],%[src_ystride]),%%mm1\n\t"
/*src+2*src_ystride*/
"movq (%[src],%[src_ystride],2),%%mm2\n\t"
/*src+3*src_ystride*/
"movq (%[src],%[s]),%%mm3\n\t"
/*dst+0*dst_ystride*/
"movq %%mm0,(%[dst])\n\t"
/*s=dst_ystride*3*/
"lea (%[dst_ystride],%[dst_ystride],2),%[s]\n\t"
/*dst+1*dst_ystride*/
"movq %%mm1,(%[dst],%[dst_ystride])\n\t"
/*dst+2*dst_ystride*/
"movq %%mm2,(%[dst],%[dst_ystride],2)\n\t"
/*dst+3*dst_ystride*/
"movq %%mm3,(%[dst],%[s])\n\t"
:[s]"=&r"(s)
:[dst]"r"(dst),[src]"r"(src),[dst_ystride]"r"(dst_ystride),
[src_ystride]"r"(src_ystride)
:"memory"
);
}
/*This needs to be removed when decode specific functions are implemented:*/
__asm__ __volatile__("emms\n\t");
}
static void loop_filter_v(unsigned char *_pix,int _ystride,
const ogg_int16_t *_ll){
ptrdiff_t s;
_pix-=_ystride*2;
__asm__ __volatile__(
/*mm0=0*/
"pxor %%mm0,%%mm0\n\t"
/*s=_ystride*3*/
"lea (%[ystride],%[ystride],2),%[s]\n\t"
/*mm7=_pix[0...8]*/
"movq (%[pix]),%%mm7\n\t"
/*mm4=_pix[0...8+_ystride*3]*/
"movq (%[pix],%[s]),%%mm4\n\t"
/*mm6=_pix[0...8]*/
"movq %%mm7,%%mm6\n\t"
/*Expand unsigned _pix[0...3] to 16 bits.*/
"punpcklbw %%mm0,%%mm6\n\t"
"movq %%mm4,%%mm5\n\t"
/*Expand unsigned _pix[4...8] to 16 bits.*/
"punpckhbw %%mm0,%%mm7\n\t"
/*Expand other arrays too.*/
"punpcklbw %%mm0,%%mm4\n\t"
"punpckhbw %%mm0,%%mm5\n\t"
/*mm7:mm6=_p[0...8]-_p[0...8+_ystride*3]:*/
"psubw %%mm4,%%mm6\n\t"
"psubw %%mm5,%%mm7\n\t"
/*mm5=mm4=_pix[0...8+_ystride]*/
"movq (%[pix],%[ystride]),%%mm4\n\t"
/*mm1=mm3=mm2=_pix[0..8]+_ystride*2]*/
"movq (%[pix],%[ystride],2),%%mm2\n\t"
"movq %%mm4,%%mm5\n\t"
"movq %%mm2,%%mm3\n\t"
"movq %%mm2,%%mm1\n\t"
/*Expand these arrays.*/
"punpckhbw %%mm0,%%mm5\n\t"
"punpcklbw %%mm0,%%mm4\n\t"
"punpckhbw %%mm0,%%mm3\n\t"
"punpcklbw %%mm0,%%mm2\n\t"
/*mm0=3 3 3 3
mm3:mm2=_pix[0...8+_ystride*2]-_pix[0...8+_ystride]*/
"pcmpeqw %%mm0,%%mm0\n\t"
"psubw %%mm5,%%mm3\n\t"
"psrlw $14,%%mm0\n\t"
"psubw %%mm4,%%mm2\n\t"
/*Scale by 3.*/
"pmullw %%mm0,%%mm3\n\t"
"pmullw %%mm0,%%mm2\n\t"
/*mm0=4 4 4 4
f=mm3:mm2==_pix[0...8]-_pix[0...8+_ystride*3]+
3*(_pix[0...8+_ystride*2]-_pix[0...8+_ystride])*/
"psrlw $1,%%mm0\n\t"
"paddw %%mm7,%%mm3\n\t"
"psllw $2,%%mm0\n\t"
"paddw %%mm6,%%mm2\n\t"
/*Add 4.*/
"paddw %%mm0,%%mm3\n\t"
"paddw %%mm0,%%mm2\n\t"
/*"Divide" by 8.*/
"psraw $3,%%mm3\n\t"
"psraw $3,%%mm2\n\t"
/*Now compute lflim of mm3:mm2 cf. Section 7.10 of the sepc.*/
/*Free up mm5.*/
"packuswb %%mm5,%%mm4\n\t"
/*mm0=L L L L*/
"movq (%[ll]),%%mm0\n\t"
/*if(R_i<-2L||R_i>2L)R_i=0:*/
"movq %%mm2,%%mm5\n\t"
"pxor %%mm6,%%mm6\n\t"
"movq %%mm0,%%mm7\n\t"
"psubw %%mm0,%%mm6\n\t"
"psllw $1,%%mm7\n\t"
"psllw $1,%%mm6\n\t"
/*mm2==R_3 R_2 R_1 R_0*/
/*mm5==R_3 R_2 R_1 R_0*/
/*mm6==-2L -2L -2L -2L*/
/*mm7==2L 2L 2L 2L*/
"pcmpgtw %%mm2,%%mm7\n\t"
"pcmpgtw %%mm6,%%mm5\n\t"
"pand %%mm7,%%mm2\n\t"
"movq %%mm0,%%mm7\n\t"
"pand %%mm5,%%mm2\n\t"
"psllw $1,%%mm7\n\t"
"movq %%mm3,%%mm5\n\t"
/*mm3==R_7 R_6 R_5 R_4*/
/*mm5==R_7 R_6 R_5 R_4*/
/*mm6==-2L -2L -2L -2L*/
/*mm7==2L 2L 2L 2L*/
"pcmpgtw %%mm3,%%mm7\n\t"
"pcmpgtw %%mm6,%%mm5\n\t"
"pand %%mm7,%%mm3\n\t"
"movq %%mm0,%%mm7\n\t"
"pand %%mm5,%%mm3\n\t"
/*if(R_i<-L)R_i'=R_i+2L;
if(R_i>L)R_i'=R_i-2L;
if(R_i<-L||R_i>L)R_i=-R_i':*/
"psraw $1,%%mm6\n\t"
"movq %%mm2,%%mm5\n\t"
"psllw $1,%%mm7\n\t"
/*mm2==R_3 R_2 R_1 R_0*/
/*mm5==R_3 R_2 R_1 R_0*/
/*mm6==-L -L -L -L*/
/*mm0==L L L L*/
/*mm5=R_i>L?FF:00*/
"pcmpgtw %%mm0,%%mm5\n\t"
/*mm6=-L>R_i?FF:00*/
"pcmpgtw %%mm2,%%mm6\n\t"
/*mm7=R_i>L?2L:0*/
"pand %%mm5,%%mm7\n\t"
/*mm2=R_i>L?R_i-2L:R_i*/
"psubw %%mm7,%%mm2\n\t"
"movq %%mm0,%%mm7\n\t"
/*mm5=-L>R_i||R_i>L*/
"por %%mm6,%%mm5\n\t"
"psllw $1,%%mm7\n\t"
/*mm7=-L>R_i?2L:0*/
"pand %%mm6,%%mm7\n\t"
"pxor %%mm6,%%mm6\n\t"
/*mm2=-L>R_i?R_i+2L:R_i*/
"paddw %%mm7,%%mm2\n\t"
"psubw %%mm0,%%mm6\n\t"
/*mm5=-L>R_i||R_i>L?-R_i':0*/
"pand %%mm2,%%mm5\n\t"
"movq %%mm0,%%mm7\n\t"
/*mm2=-L>R_i||R_i>L?0:R_i*/
"psubw %%mm5,%%mm2\n\t"
"psllw $1,%%mm7\n\t"
/*mm2=-L>R_i||R_i>L?-R_i':R_i*/
"psubw %%mm5,%%mm2\n\t"
"movq %%mm3,%%mm5\n\t"
/*mm3==R_7 R_6 R_5 R_4*/
/*mm5==R_7 R_6 R_5 R_4*/
/*mm6==-L -L -L -L*/
/*mm0==L L L L*/
/*mm6=-L>R_i?FF:00*/
"pcmpgtw %%mm3,%%mm6\n\t"
/*mm5=R_i>L?FF:00*/
"pcmpgtw %%mm0,%%mm5\n\t"
/*mm7=R_i>L?2L:0*/
"pand %%mm5,%%mm7\n\t"
/*mm2=R_i>L?R_i-2L:R_i*/
"psubw %%mm7,%%mm3\n\t"
"psllw $1,%%mm0\n\t"
/*mm5=-L>R_i||R_i>L*/
"por %%mm6,%%mm5\n\t"
/*mm0=-L>R_i?2L:0*/
"pand %%mm6,%%mm0\n\t"
/*mm3=-L>R_i?R_i+2L:R_i*/
"paddw %%mm0,%%mm3\n\t"
/*mm5=-L>R_i||R_i>L?-R_i':0*/
"pand %%mm3,%%mm5\n\t"
/*mm2=-L>R_i||R_i>L?0:R_i*/
"psubw %%mm5,%%mm3\n\t"
/*mm2=-L>R_i||R_i>L?-R_i':R_i*/
"psubw %%mm5,%%mm3\n\t"
/*Unfortunately, there's no unsigned byte+signed byte with unsigned
saturation op code, so we have to promote things back 16 bits.*/
"pxor %%mm0,%%mm0\n\t"
"movq %%mm4,%%mm5\n\t"
"punpcklbw %%mm0,%%mm4\n\t"
"punpckhbw %%mm0,%%mm5\n\t"
"movq %%mm1,%%mm6\n\t"
"punpcklbw %%mm0,%%mm1\n\t"
"punpckhbw %%mm0,%%mm6\n\t"
/*_pix[0...8+_ystride]+=R_i*/
"paddw %%mm2,%%mm4\n\t"
"paddw %%mm3,%%mm5\n\t"
/*_pix[0...8+_ystride*2]-=R_i*/
"psubw %%mm2,%%mm1\n\t"
"psubw %%mm3,%%mm6\n\t"
"packuswb %%mm5,%%mm4\n\t"
"packuswb %%mm6,%%mm1\n\t"
/*Write it back out.*/
"movq %%mm4,(%[pix],%[ystride])\n\t"
"movq %%mm1,(%[pix],%[ystride],2)\n\t"
:[s]"=&r"(s)
:[pix]"r"(_pix),[ystride]"r"((ptrdiff_t)_ystride),[ll]"r"(_ll)
:"memory"
);
}
/*This code implements the bulk of loop_filter_h().
Data are striped p0 p1 p2 p3 ... p0 p1 p2 p3 ..., so in order to load all
four p0's to one register we must transpose the values in four mmx regs.
When half is done we repeat this for the rest.*/
static void loop_filter_h4(unsigned char *_pix,ptrdiff_t _ystride,
const ogg_int16_t *_ll){
ptrdiff_t s;
/*d doesn't technically need to be 64-bit on x86-64, but making it so will
help avoid partial register stalls.*/
ptrdiff_t d;
__asm__ __volatile__(
/*x x x x 3 2 1 0*/
"movd (%[pix]),%%mm0\n\t"
/*s=_ystride*3*/
"lea (%[ystride],%[ystride],2),%[s]\n\t"
/*x x x x 7 6 5 4*/
"movd (%[pix],%[ystride]),%%mm1\n\t"
/*x x x x B A 9 8*/
"movd (%[pix],%[ystride],2),%%mm2\n\t"
/*x x x x F E D C*/
"movd (%[pix],%[s]),%%mm3\n\t"
/*mm0=7 3 6 2 5 1 4 0*/
"punpcklbw %%mm1,%%mm0\n\t"
/*mm2=F B E A D 9 C 8*/
"punpcklbw %%mm3,%%mm2\n\t"
/*mm1=7 3 6 2 5 1 4 0*/
"movq %%mm0,%%mm1\n\t"
/*mm0=F B 7 3 E A 6 2*/
"punpckhwd %%mm2,%%mm0\n\t"
/*mm1=D 9 5 1 C 8 4 0*/
"punpcklwd %%mm2,%%mm1\n\t"
"pxor %%mm7,%%mm7\n\t"
/*mm5=D 9 5 1 C 8 4 0*/
"movq %%mm1,%%mm5\n\t"
/*mm1=x C x 8 x 4 x 0==pix[0]*/
"punpcklbw %%mm7,%%mm1\n\t"
/*mm5=x D x 9 x 5 x 1==pix[1]*/
"punpckhbw %%mm7,%%mm5\n\t"
/*mm3=F B 7 3 E A 6 2*/
"movq %%mm0,%%mm3\n\t"
/*mm0=x E x A x 6 x 2==pix[2]*/
"punpcklbw %%mm7,%%mm0\n\t"
/*mm3=x F x B x 7 x 3==pix[3]*/
"punpckhbw %%mm7,%%mm3\n\t"
/*mm1=mm1-mm3==pix[0]-pix[3]*/
"psubw %%mm3,%%mm1\n\t"
/*Save a copy of pix[2] for later.*/
"movq %%mm0,%%mm4\n\t"
/*mm2=3 3 3 3
mm0=mm0-mm5==pix[2]-pix[1]*/
"pcmpeqw %%mm2,%%mm2\n\t"
"psubw %%mm5,%%mm0\n\t"
"psrlw $14,%%mm2\n\t"
/*Scale by 3.*/
"pmullw %%mm2,%%mm0\n\t"
/*mm2=4 4 4 4
f=mm1==_pix[0]-_pix[3]+ 3*(_pix[2]-_pix[1])*/
"psrlw $1,%%mm2\n\t"
"paddw %%mm1,%%mm0\n\t"
"psllw $2,%%mm2\n\t"
/*Add 4.*/
"paddw %%mm2,%%mm0\n\t"
/*"Divide" by 8, producing the residuals R_i.*/
"psraw $3,%%mm0\n\t"
/*Now compute lflim of mm0 cf. Section 7.10 of the sepc.*/
/*mm6=L L L L*/
"movq (%[ll]),%%mm6\n\t"
/*if(R_i<-2L||R_i>2L)R_i=0:*/
"movq %%mm0,%%mm1\n\t"
"pxor %%mm2,%%mm2\n\t"
"movq %%mm6,%%mm3\n\t"
"psubw %%mm6,%%mm2\n\t"
"psllw $1,%%mm3\n\t"
"psllw $1,%%mm2\n\t"
/*mm0==R_3 R_2 R_1 R_0*/
/*mm1==R_3 R_2 R_1 R_0*/
/*mm2==-2L -2L -2L -2L*/
/*mm3==2L 2L 2L 2L*/
"pcmpgtw %%mm0,%%mm3\n\t"
"pcmpgtw %%mm2,%%mm1\n\t"
"pand %%mm3,%%mm0\n\t"
"pand %%mm1,%%mm0\n\t"
/*if(R_i<-L)R_i'=R_i+2L;
if(R_i>L)R_i'=R_i-2L;
if(R_i<-L||R_i>L)R_i=-R_i':*/
"psraw $1,%%mm2\n\t"
"movq %%mm0,%%mm1\n\t"
"movq %%mm6,%%mm3\n\t"
/*mm0==R_3 R_2 R_1 R_0*/
/*mm1==R_3 R_2 R_1 R_0*/
/*mm2==-L -L -L -L*/
/*mm6==L L L L*/
/*mm2=-L>R_i?FF:00*/
"pcmpgtw %%mm0,%%mm2\n\t"
/*mm1=R_i>L?FF:00*/
"pcmpgtw %%mm6,%%mm1\n\t"
/*mm3=2L 2L 2L 2L*/
"psllw $1,%%mm3\n\t"
/*mm6=2L 2L 2L 2L*/
"psllw $1,%%mm6\n\t"
/*mm3=R_i>L?2L:0*/
"pand %%mm1,%%mm3\n\t"
/*mm6=-L>R_i?2L:0*/
"pand %%mm2,%%mm6\n\t"
/*mm0=R_i>L?R_i-2L:R_i*/
"psubw %%mm3,%%mm0\n\t"
/*mm1=-L>R_i||R_i>L*/
"por %%mm2,%%mm1\n\t"
/*mm0=-L>R_i?R_i+2L:R_i*/
"paddw %%mm6,%%mm0\n\t"
/*mm1=-L>R_i||R_i>L?R_i':0*/
"pand %%mm0,%%mm1\n\t"
/*mm0=-L>R_i||R_i>L?0:R_i*/
"psubw %%mm1,%%mm0\n\t"
/*mm0=-L>R_i||R_i>L?-R_i':R_i*/
"psubw %%mm1,%%mm0\n\t"
/*_pix[1]+=R_i;*/
"paddw %%mm0,%%mm5\n\t"
/*_pix[2]-=R_i;*/
"psubw %%mm0,%%mm4\n\t"
/*mm5=x x x x D 9 5 1*/
"packuswb %%mm7,%%mm5\n\t"
/*mm4=x x x x E A 6 2*/
"packuswb %%mm7,%%mm4\n\t"
/*mm5=E D A 9 6 5 2 1*/
"punpcklbw %%mm4,%%mm5\n\t"
/*d=6 5 2 1*/
"movd %%mm5,%[d]\n\t"
"movw %w[d],1(%[pix])\n\t"
/*Why is there such a big stall here?*/
"psrlq $32,%%mm5\n\t"
"shr $16,%[d]\n\t"
"movw %w[d],1(%[pix],%[ystride])\n\t"
/*d=E D A 9*/
"movd %%mm5,%[d]\n\t"
"movw %w[d],1(%[pix],%[ystride],2)\n\t"
"shr $16,%[d]\n\t"
"movw %w[d],1(%[pix],%[s])\n\t"
:[s]"=&r"(s),[d]"=&r"(d),
[pix]"+r"(_pix),[ystride]"+r"(_ystride),[ll]"+r"(_ll)
:
:"memory"
);
}
static void loop_filter_h(unsigned char *_pix,int _ystride,
const ogg_int16_t *_ll){
_pix-=2;
loop_filter_h4(_pix,_ystride,_ll);
loop_filter_h4(_pix+(_ystride<<2),_ystride,_ll);
}
/*We copy the whole function because the MMX routines will be inlined 4 times,
and we can do just a single emms call at the end this way.
We also do not use the _bv lookup table, instead computing the values that
would lie in it on the fly.*/
/*Apply the loop filter to a given set of fragment rows in the given plane.
The filter may be run on the bottom edge, affecting pixels in the next row of
fragments, so this row also needs to be available.
_bv: The bounding values array.
_refi: The index of the frame buffer to filter.
_pli: The color plane to filter.
_fragy0: The Y coordinate of the first fragment row to filter.
_fragy_end: The Y coordinate of the fragment row to stop filtering at.*/
void oc_state_loop_filter_frag_rows_mmx(oc_theora_state *_state,int *_bv,
int _refi,int _pli,int _fragy0,int _fragy_end){
ogg_int16_t __attribute__((aligned(8))) ll[4];
th_img_plane *iplane;
oc_fragment_plane *fplane;
oc_fragment *frag_top;
oc_fragment *frag0;
oc_fragment *frag;
oc_fragment *frag_end;
oc_fragment *frag0_end;
oc_fragment *frag_bot;
ll[0]=ll[1]=ll[2]=ll[3]=
(ogg_int16_t)_state->loop_filter_limits[_state->qis[0]];
iplane=_state->ref_frame_bufs[_refi]+_pli;
fplane=_state->fplanes+_pli;
/*The following loops are constructed somewhat non-intuitively on purpose.
The main idea is: if a block boundary has at least one coded fragment on
it, the filter is applied to it.
However, the order that the filters are applied in matters, and VP3 chose
the somewhat strange ordering used below.*/
frag_top=_state->frags+fplane->froffset;
frag0=frag_top+_fragy0*fplane->nhfrags;
frag0_end=frag0+(_fragy_end-_fragy0)*fplane->nhfrags;
frag_bot=_state->frags+fplane->froffset+fplane->nfrags;
while(frag0<frag0_end){
frag=frag0;
frag_end=frag+fplane->nhfrags;
while(frag<frag_end){
if(frag->coded){
if(frag>frag0){
loop_filter_h(frag->buffer[_refi],iplane->stride,ll);
}
if(frag0>frag_top){
loop_filter_v(frag->buffer[_refi],iplane->stride,ll);
}
if(frag+1<frag_end&&!(frag+1)->coded){
loop_filter_h(frag->buffer[_refi]+8,iplane->stride,ll);
}
if(frag+fplane->nhfrags<frag_bot&&!(frag+fplane->nhfrags)->coded){
loop_filter_v((frag+fplane->nhfrags)->buffer[_refi],
iplane->stride,ll);
}
}
frag++;
}
frag0+=fplane->nhfrags;
}
/*This needs to be removed when decode specific functions are implemented:*/
__asm__ __volatile__("emms\n\t");
}
#endif

View file

@ -1,42 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: x86int.h 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
#if !defined(_x86_x86int_H)
# define _x86_x86int_H (1)
# include "../../internal.h"
void oc_state_vtable_init_x86(oc_theora_state *_state);
void oc_frag_recon_intra_mmx(unsigned char *_dst,int _dst_ystride,
const ogg_int16_t *_residue);
void oc_frag_recon_inter_mmx(unsigned char *_dst,int _dst_ystride,
const unsigned char *_src,int _src_ystride,const ogg_int16_t *_residue);
void oc_frag_recon_inter2_mmx(unsigned char *_dst,int _dst_ystride,
const unsigned char *_src1,int _src1_ystride,const unsigned char *_src2,
int _src2_ystride,const ogg_int16_t *_residue);
void oc_state_frag_copy_mmx(const oc_theora_state *_state,const int *_fragis,
int _nfragis,int _dst_frame,int _src_frame,int _pli);
void oc_state_frag_recon_mmx(oc_theora_state *_state,oc_fragment *_frag,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,int _ncoefs,
ogg_uint16_t _dc_iquant,const ogg_uint16_t _ac_iquant[64]);
void oc_restore_fpu_mmx(void);
void oc_idct8x8_mmx(ogg_int16_t _y[64]);
void oc_idct8x8_10_mmx(ogg_int16_t _y[64]);
void oc_fill_idct_constants_mmx(void);
void oc_state_loop_filter_frag_rows_mmx(oc_theora_state *_state,int *_bv,
int _refi,int _pli,int _fragy0,int _fragy_end);
#endif

View file

@ -1,214 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id:
********************************************************************/
#include "../../internal.h"
/* ------------------------------------------------------------------------
MMX reconstruction fragment routines for Visual Studio.
Tested with VS2005. Should compile for VS2003 and VC6 as well.
Initial implementation 2007 by Nils Pipenbrinck.
---------------------------------------------------------------------*/
#if defined(USE_ASM)
void oc_frag_recon_intra_mmx(unsigned char *_dst,int _dst_ystride,
const ogg_int16_t *_residue){
/* ---------------------------------------------------------------------
This function does the inter reconstruction step with 8 iterations
unrolled. The iteration for each instruction is noted by the #id in the
comments (in case you want to reconstruct it)
--------------------------------------------------------------------- */
_asm{
mov edi, [_residue] /* load residue ptr */
mov eax, 0x00800080 /* generate constant */
mov ebx, [_dst_ystride] /* load dst-stride */
mov edx, [_dst] /* load dest pointer */
/* unrolled loop begins here */
movd mm0, eax /* load constant */
movq mm1, [edi+ 8*0] /* #1 load low residue */
movq mm2, [edi+ 8*1] /* #1 load high residue */
punpckldq mm0, mm0 /* build constant */
movq mm3, [edi+ 8*2] /* #2 load low residue */
movq mm4, [edi+ 8*3] /* #2 load high residue */
movq mm5, [edi+ 8*4] /* #3 load low residue */
movq mm6, [edi+ 8*5] /* #3 load high residue */
paddsw mm1, mm0 /* #1 bias low residue */
paddsw mm2, mm0 /* #1 bias high residue */
packuswb mm1, mm2 /* #1 pack to byte */
paddsw mm3, mm0 /* #2 bias low residue */
paddsw mm4, mm0 /* #2 bias high residue */
packuswb mm3, mm4 /* #2 pack to byte */
paddsw mm5, mm0 /* #3 bias low residue */
paddsw mm6, mm0 /* #3 bias high residue */
packuswb mm5, mm6 /* #3 pack to byte */
movq [edx], mm1 /* #1 write row */
movq [edx + ebx], mm3 /* #2 write row */
movq [edx + ebx*2], mm5 /* #3 write row */
movq mm1, [edi+ 8*6] /* #4 load low residue */
lea ecx, [ebx + ebx*2] /* make dst_ystride * 3 */
movq mm2, [edi+ 8*7] /* #4 load high residue */
movq mm3, [edi+ 8*8] /* #5 load low residue */
lea esi, [ebx*4 + ebx] /* make dst_ystride * 5 */
movq mm4, [edi+ 8*9] /* #5 load high residue */
movq mm5, [edi+ 8*10] /* #6 load low residue */
lea eax, [ecx*2 + ebx] /* make dst_ystride * 7 */
movq mm6, [edi+ 8*11] /* #6 load high residue */
paddsw mm1, mm0 /* #4 bias low residue */
paddsw mm2, mm0 /* #4 bias high residue */
packuswb mm1, mm2 /* #4 pack to byte */
paddsw mm3, mm0 /* #5 bias low residue */
paddsw mm4, mm0 /* #5 bias high residue */
packuswb mm3, mm4 /* #5 pack to byte */
paddsw mm5, mm0 /* #6 bias low residue */
paddsw mm6, mm0 /* #6 bias high residue */
packuswb mm5, mm6 /* #6 pack to byte */
movq [edx + ecx], mm1 /* #4 write row */
movq [edx + ebx*4], mm3 /* #5 write row */
movq [edx + esi], mm5 /* #6 write row */
movq mm1, [edi+ 8*12] /* #7 load low residue */
movq mm2, [edi+ 8*13] /* #7 load high residue */
movq mm3, [edi+ 8*14] /* #8 load low residue */
movq mm4, [edi+ 8*15] /* #8 load high residue */
paddsw mm1, mm0 /* #7 bias low residue */
paddsw mm2, mm0 /* #7 bias high residue */
packuswb mm1, mm2 /* #7 pack to byte */
paddsw mm3, mm0 /* #8 bias low residue */
paddsw mm4, mm0 /* #8 bias high residue */
packuswb mm3, mm4 /* #8 pack to byte */
movq [edx + ecx*2], mm1 /* #7 write row */
movq [edx + eax], mm3 /* #8 write row */
}
}
void oc_frag_recon_inter_mmx (unsigned char *_dst, int _dst_ystride,
const unsigned char *_src, int _src_ystride, const ogg_int16_t *_residue){
/* ---------------------------------------------------------------------
This function does the inter reconstruction step with two iterations
running in parallel to hide some load-latencies and break the dependency
chains. The iteration for each instruction is noted by the #id in the
comments (in case you want to reconstruct it)
--------------------------------------------------------------------- */
_asm{
pxor mm0, mm0 /* generate constant 0 */
mov esi, [_src]
mov edi, [_residue]
mov eax, [_src_ystride]
mov edx, [_dst]
mov ebx, [_dst_ystride]
mov ecx, 4
align 16
nextchunk:
movq mm3, [esi] /* #1 load source */
movq mm1, [edi+0] /* #1 load residium low */
movq mm2, [edi+8] /* #1 load residium high */
movq mm7, [esi+eax] /* #2 load source */
movq mm4, mm3 /* #1 get copy of src */
movq mm5, [edi+16] /* #2 load residium low */
punpckhbw mm4, mm0 /* #1 expand high source */
movq mm6, [edi+24] /* #2 load residium high */
punpcklbw mm3, mm0 /* #1 expand low source */
paddsw mm4, mm2 /* #1 add residium high */
movq mm2, mm7 /* #2 get copy of src */
paddsw mm3, mm1 /* #1 add residium low */
punpckhbw mm2, mm0 /* #2 expand high source */
packuswb mm3, mm4 /* #1 final row pixels */
punpcklbw mm7, mm0 /* #2 expand low source */
movq [edx], mm3 /* #1 write row */
paddsw mm2, mm6 /* #2 add residium high */
add edi, 32 /* residue += 4 */
paddsw mm7, mm5 /* #2 add residium low */
sub ecx, 1 /* update loop counter */
packuswb mm7, mm2 /* #2 final row */
lea esi, [esi+eax*2] /* src += stride * 2 */
movq [edx + ebx], mm7 /* #2 write row */
lea edx, [edx+ebx*2] /* dst += stride * 2 */
jne nextchunk
}
}
void oc_frag_recon_inter2_mmx(unsigned char *_dst, int _dst_ystride,
const unsigned char *_src1, int _src1_ystride, const unsigned char *_src2,
int _src2_ystride,const ogg_int16_t *_residue){
/* ---------------------------------------------------------------------
This function does the inter2 reconstruction step.The building of the
average is done with a bit-twiddeling trick to avoid excessive register
copy work during byte to word conversion.
average = (a & b) + (((a ^ b) & 0xfe) >> 1);
(shown for a single byte; it's done with 8 of them at a time)
Slightly faster than the obvious method using add and shift, but not
earthshaking improvement either.
If anyone comes up with a way that produces bit-identical outputs
using the pavgb instruction let me know and I'll do the 3dnow codepath.
--------------------------------------------------------------------- */
_asm{
mov eax, 0xfefefefe
mov esi, [_src1]
mov edi, [_src2]
movd mm1, eax
mov ebx, [_residue]
mov edx, [_dst]
mov eax, [_dst_ystride]
punpckldq mm1, mm1 /* replicate lsb32 */
mov ecx, 8 /* init loop counter */
pxor mm0, mm0 /* constant zero */
sub edx, eax /* dst -= dst_stride */
align 16
nextrow:
movq mm2, [esi] /* load source1 */
movq mm3, [edi] /* load source2 */
movq mm5, [ebx + 0] /* load lower residue */
movq mm6, [ebx + 8] /* load higer residue */
add esi, _src1_ystride /* src1 += src1_stride */
add edi, _src2_ystride /* src2 += src1_stride */
movq mm4, mm2 /* get copy of source1 */
pand mm2, mm3 /* s1 & s2 (avg part) */
pxor mm3, mm4 /* s1 ^ s2 (avg part) */
add ebx, 16 /* residue++ */
pand mm3, mm1 /* mask out low bits */
psrlq mm3, 1 /* shift xor avg-part */
paddd mm3, mm2 /* build final average */
add edx, eax /* dst += dst_stride */
movq mm2, mm3 /* get copy of average */
punpckhbw mm3, mm0 /* average high */
punpcklbw mm2, mm0 /* average low */
paddsw mm3, mm6 /* high + residue */
paddsw mm2, mm5 /* low + residue */
sub ecx, 1 /* update loop counter */
packuswb mm2, mm3 /* pack and saturate */
movq [edx], mm2 /* write row */
jne nextrow
}
}
void oc_restore_fpu_mmx(void){
_asm { emms }
}
#endif

File diff suppressed because it is too large Load diff

View file

@ -1,377 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id:
********************************************************************/
/* -------------------------------------------------------------------
MMX based loop filter for the theora codec.
Originally written by Rudolf Marek, based on code from On2's VP3.
Converted to Visual Studio inline assembly by Nils Pipenbrinck.
Note: I can't test these since my example files never get into the
loop filters, but the code has been converted semi-automatic from
the GCC sources, so it ought to work.
---------------------------------------------------------------------*/
#include "../../internal.h"
#include "x86int.h"
#include <mmintrin.h>
#if defined(USE_ASM)
static void loop_filter_v(unsigned char *_pix,int _ystride,
const ogg_int16_t *_ll){
_asm {
mov eax, [_pix]
mov edx, [_ystride]
mov ebx, [_ll]
/* _pix -= ystride */
sub eax, edx
/* mm0=0 */
pxor mm0, mm0
/* _pix -= ystride */
sub eax, edx
/* esi=_ystride*3 */
lea esi, [edx + edx*2]
/* mm7=_pix[0...8]*/
movq mm7, [eax]
/* mm4=_pix[0...8+_ystride*3]*/
movq mm4, [eax + esi]
/* mm6=_pix[0...8]*/
movq mm6, mm7
/* Expand unsigned _pix[0...3] to 16 bits.*/
punpcklbw mm6, mm0
movq mm5, mm4
/* Expand unsigned _pix[4...7] to 16 bits.*/
punpckhbw mm7, mm0
punpcklbw mm4, mm0
/* Expand other arrays too.*/
punpckhbw mm5, mm0
/*mm7:mm6=_p[0...7]-_p[0...7+_ystride*3]:*/
psubw mm6, mm4
psubw mm7, mm5
/*mm5=mm4=_pix[0...7+_ystride]*/
movq mm4, [eax + edx]
/*mm1=mm3=mm2=_pix[0..7]+_ystride*2]*/
movq mm2, [eax + edx*2]
movq mm5, mm4
movq mm3, mm2
movq mm1, mm2
/*Expand these arrays.*/
punpckhbw mm5, mm0
punpcklbw mm4, mm0
punpckhbw mm3, mm0
punpcklbw mm2, mm0
pcmpeqw mm0, mm0
/*mm0=3 3 3 3
mm3:mm2=_pix[0...8+_ystride*2]-_pix[0...8+_ystride]*/
psubw mm3, mm5
psrlw mm0, 14
psubw mm2, mm4
/*Scale by 3.*/
pmullw mm3, mm0
pmullw mm2, mm0
/*mm0=4 4 4 4
f=mm3:mm2==_pix[0...8]-_pix[0...8+_ystride*3]+
3*(_pix[0...8+_ystride*2]-_pix[0...8+_ystride])*/
psrlw mm0, 1
paddw mm3, mm7
psllw mm0, 2
paddw mm2, mm6
/*Add 4.*/
paddw mm3, mm0
paddw mm2, mm0
/*"Divide" by 8.*/
psraw mm3, 3
psraw mm2, 3
/*Now compute lflim of mm3:mm2 cf. Section 7.10 of the sepc.*/
/*Free up mm5.*/
packuswb mm4, mm5
/*mm0=L L L L*/
movq mm0, [ebx]
/*if(R_i<-2L||R_i>2L)R_i=0:*/
movq mm5, mm2
pxor mm6, mm6
movq mm7, mm0
psubw mm6, mm0
psllw mm7, 1
psllw mm6, 1
/*mm2==R_3 R_2 R_1 R_0*/
/*mm5==R_3 R_2 R_1 R_0*/
/*mm6==-2L -2L -2L -2L*/
/*mm7==2L 2L 2L 2L*/
pcmpgtw mm7, mm2
pcmpgtw mm5, mm6
pand mm2, mm7
movq mm7, mm0
pand mm2, mm5
psllw mm7, 1
movq mm5, mm3
/*mm3==R_7 R_6 R_5 R_4*/
/*mm5==R_7 R_6 R_5 R_4*/
/*mm6==-2L -2L -2L -2L*/
/*mm7==2L 2L 2L 2L*/
pcmpgtw mm7, mm3
pcmpgtw mm5, mm6
pand mm3, mm7
movq mm7, mm0
pand mm3, mm5
/*if(R_i<-L)R_i'=R_i+2L;
if(R_i>L)R_i'=R_i-2L;
if(R_i<-L||R_i>L)R_i=-R_i':*/
psraw mm6, 1
movq mm5, mm2
psllw mm7, 1
/*mm2==R_3 R_2 R_1 R_0*/
/*mm5==R_3 R_2 R_1 R_0*/
/*mm6==-L -L -L -L*/
/*mm0==L L L L*/
/*mm5=R_i>L?FF:00*/
pcmpgtw mm5, mm0
/*mm6=-L>R_i?FF:00*/
pcmpgtw mm6, mm2
/*mm7=R_i>L?2L:0*/
pand mm7, mm5
/*mm2=R_i>L?R_i-2L:R_i*/
psubw mm2, mm7
movq mm7, mm0
/*mm5=-L>R_i||R_i>L*/
por mm5, mm6
psllw mm7, 1
/*mm7=-L>R_i?2L:0*/
pand mm7, mm6
pxor mm6, mm6
/*mm2=-L>R_i?R_i+2L:R_i*/
paddw mm2, mm7
psubw mm6, mm0
/*mm5=-L>R_i||R_i>L?-R_i':0*/
pand mm5, mm2
movq mm7, mm0
/*mm2=-L>R_i||R_i>L?0:R_i*/
psubw mm2, mm5
psllw mm7, 1
/*mm2=-L>R_i||R_i>L?-R_i':R_i*/
psubw mm2, mm5
movq mm5, mm3
/*mm3==R_7 R_6 R_5 R_4*/
/*mm5==R_7 R_6 R_5 R_4*/
/*mm6==-L -L -L -L*/
/*mm0==L L L L*/
/*mm6=-L>R_i?FF:00*/
pcmpgtw mm6, mm3
/*mm5=R_i>L?FF:00*/
pcmpgtw mm5, mm0
/*mm7=R_i>L?2L:0*/
pand mm7, mm5
/*mm2=R_i>L?R_i-2L:R_i*/
psubw mm3, mm7
psllw mm0, 1
/*mm5=-L>R_i||R_i>L*/
por mm5, mm6
/*mm0=-L>R_i?2L:0*/
pand mm0, mm6
/*mm3=-L>R_i?R_i+2L:R_i*/
paddw mm3, mm0
/*mm5=-L>R_i||R_i>L?-R_i':0*/
pand mm5, mm3
/*mm2=-L>R_i||R_i>L?0:R_i*/
psubw mm3, mm5
/*mm3=-L>R_i||R_i>L?-R_i':R_i*/
psubw mm3, mm5
/*Unfortunately, there's no unsigned byte+signed byte with unsigned
saturation op code, so we have to promote things back 16 bits.*/
pxor mm0, mm0
movq mm5, mm4
punpcklbw mm4, mm0
punpckhbw mm5, mm0
movq mm6, mm1
punpcklbw mm1, mm0
punpckhbw mm6, mm0
/*_pix[0...8+_ystride]+=R_i*/
paddw mm4, mm2
paddw mm5, mm3
/*_pix[0...8+_ystride*2]-=R_i*/
psubw mm1, mm2
psubw mm6, mm3
packuswb mm4, mm5
packuswb mm1, mm6
/*Write it back out.*/
movq [eax + edx], mm4
movq [eax + edx*2], mm1
}
}
/*This code implements the bulk of loop_filter_h().
Data are striped p0 p1 p2 p3 ... p0 p1 p2 p3 ..., so in order to load all
four p0's to one register we must transpose the values in four mmx regs.
When half is done we repeat this for the rest.*/
static void loop_filter_h4(unsigned char *_pix,long _ystride,
const ogg_int16_t *_ll){
/* todo: merge the comments from the GCC sources */
_asm {
mov ecx, [_pix]
mov edx, [_ystride]
mov eax, [_ll]
/*esi=_ystride*3*/
lea esi, [edx + edx*2]
movd mm0, dword ptr [ecx]
movd mm1, dword ptr [ecx + edx]
movd mm2, dword ptr [ecx + edx*2]
movd mm3, dword ptr [ecx + esi]
punpcklbw mm0, mm1
punpcklbw mm2, mm3
movq mm1, mm0
punpckhwd mm0, mm2
punpcklwd mm1, mm2
pxor mm7, mm7
movq mm5, mm1
punpcklbw mm1, mm7
punpckhbw mm5, mm7
movq mm3, mm0
punpcklbw mm0, mm7
punpckhbw mm3, mm7
psubw mm1, mm3
movq mm4, mm0
pcmpeqw mm2, mm2
psubw mm0, mm5
psrlw mm2, 14
pmullw mm0, mm2
psrlw mm2, 1
paddw mm0, mm1
psllw mm2, 2
paddw mm0, mm2
psraw mm0, 3
movq mm6, qword ptr [eax]
movq mm1, mm0
pxor mm2, mm2
movq mm3, mm6
psubw mm2, mm6
psllw mm3, 1
psllw mm2, 1
pcmpgtw mm3, mm0
pcmpgtw mm1, mm2
pand mm0, mm3
pand mm0, mm1
psraw mm2, 1
movq mm1, mm0
movq mm3, mm6
pcmpgtw mm2, mm0
pcmpgtw mm1, mm6
psllw mm3, 1
psllw mm6, 1
pand mm3, mm1
pand mm6, mm2
psubw mm0, mm3
por mm1, mm2
paddw mm0, mm6
pand mm1, mm0
psubw mm0, mm1
psubw mm0, mm1
paddw mm5, mm0
psubw mm4, mm0
packuswb mm5, mm7
packuswb mm4, mm7
punpcklbw mm5, mm4
movd edi, mm5
mov word ptr [ecx + 01H], di
psrlq mm5, 32
shr edi, 16
mov word ptr [ecx + edx + 01H], di
movd edi, mm5
mov word ptr [ecx + edx*2 + 01H], di
shr edi, 16
mov word ptr [ecx + esi + 01H], di
}
}
static void loop_filter_h(unsigned char *_pix,int _ystride,
const ogg_int16_t *_ll){
_pix-=2;
loop_filter_h4(_pix,_ystride,_ll);
loop_filter_h4(_pix+(_ystride<<2),_ystride,_ll);
}
/*We copy the whole function because the MMX routines will be inlined 4 times,
and we can do just a single emms call at the end this way.
We also do not use the _bv lookup table, instead computing the values that
would lie in it on the fly.*/
/*Apply the loop filter to a given set of fragment rows in the given plane.
The filter may be run on the bottom edge, affecting pixels in the next row of
fragments, so this row also needs to be available.
_bv: The bounding values array.
_refi: The index of the frame buffer to filter.
_pli: The color plane to filter.
_fragy0: The Y coordinate of the first fragment row to filter.
_fragy_end: The Y coordinate of the fragment row to stop filtering at.*/
void oc_state_loop_filter_frag_rows_mmx(oc_theora_state *_state,int *_bv,
int _refi,int _pli,int _fragy0,int _fragy_end){
ogg_int16_t __declspec(align(8)) ll[4];
th_img_plane *iplane;
oc_fragment_plane *fplane;
oc_fragment *frag_top;
oc_fragment *frag0;
oc_fragment *frag;
oc_fragment *frag_end;
oc_fragment *frag0_end;
oc_fragment *frag_bot;
ll[0]=ll[1]=ll[2]=ll[3]=
(ogg_int16_t)_state->loop_filter_limits[_state->qis[0]];
iplane=_state->ref_frame_bufs[_refi]+_pli;
fplane=_state->fplanes+_pli;
/*The following loops are constructed somewhat non-intuitively on purpose.
The main idea is: if a block boundary has at least one coded fragment on
it, the filter is applied to it.
However, the order that the filters are applied in matters, and VP3 chose
the somewhat strange ordering used below.*/
frag_top=_state->frags+fplane->froffset;
frag0=frag_top+_fragy0*fplane->nhfrags;
frag0_end=frag0+(_fragy_end-_fragy0)*fplane->nhfrags;
frag_bot=_state->frags+fplane->froffset+fplane->nfrags;
while(frag0<frag0_end){
frag=frag0;
frag_end=frag+fplane->nhfrags;
while(frag<frag_end){
if(frag->coded){
if(frag>frag0){
loop_filter_h(frag->buffer[_refi],iplane->stride,ll);
}
if(frag0>frag_top){
loop_filter_v(frag->buffer[_refi],iplane->stride,ll);
}
if(frag+1<frag_end&&!(frag+1)->coded){
loop_filter_h(frag->buffer[_refi]+8,iplane->stride,ll);
}
if(frag+fplane->nhfrags<frag_bot&&!(frag+fplane->nhfrags)->coded){
loop_filter_v((frag+fplane->nhfrags)->buffer[_refi],
iplane->stride,ll);
}
}
frag++;
}
frag0+=fplane->nhfrags;
}
/*This needs to be removed when decode specific functions are implemented:*/
_mm_empty();
}
#endif

View file

@ -1,189 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2008 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: mmxstate.c 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
/* ------------------------------------------------------------------------
MMX acceleration of complete fragment reconstruction algorithm.
Originally written by Rudolf Marek.
Conversion to MSC intrinsics by Nils Pipenbrinck.
---------------------------------------------------------------------*/
#if defined(USE_ASM)
#include "../../internal.h"
#include "../idct.h"
#include "x86int.h"
#include <mmintrin.h>
static const unsigned char OC_FZIG_ZAGMMX[64]=
{
0, 8, 1, 2, 9,16,24,17,
10, 3,32,11,18,25, 4,12,
5,26,19,40,33,34,41,48,
27, 6,13,20,28,21,14, 7,
56,49,42,35,43,50,57,36,
15,22,29,30,23,44,37,58,
51,59,38,45,52,31,60,53,
46,39,47,54,61,62,55,63
};
/* Fill a block with value */
static __inline void loc_fill_mmx_value (__m64 * _dst, __m64 _value){
__m64 t = _value;
_dst[0] = t; _dst[1] = t; _dst[2] = t; _dst[3] = t;
_dst[4] = t; _dst[5] = t; _dst[6] = t; _dst[7] = t;
_dst[8] = t; _dst[9] = t; _dst[10] = t; _dst[11] = t;
_dst[12] = t; _dst[13] = t; _dst[14] = t; _dst[15] = t;
}
/* copy a block of 8 byte elements using different strides */
static __inline void loc_blockcopy_mmx (unsigned char * _dst, int _dst_ystride,
unsigned char * _src, int _src_ystride){
__m64 a,b,c,d,e,f,g,h;
a = *(__m64*)(_src + 0 * _src_ystride);
b = *(__m64*)(_src + 1 * _src_ystride);
c = *(__m64*)(_src + 2 * _src_ystride);
d = *(__m64*)(_src + 3 * _src_ystride);
e = *(__m64*)(_src + 4 * _src_ystride);
f = *(__m64*)(_src + 5 * _src_ystride);
g = *(__m64*)(_src + 6 * _src_ystride);
h = *(__m64*)(_src + 7 * _src_ystride);
*(__m64*)(_dst + 0 * _dst_ystride) = a;
*(__m64*)(_dst + 1 * _dst_ystride) = b;
*(__m64*)(_dst + 2 * _dst_ystride) = c;
*(__m64*)(_dst + 3 * _dst_ystride) = d;
*(__m64*)(_dst + 4 * _dst_ystride) = e;
*(__m64*)(_dst + 5 * _dst_ystride) = f;
*(__m64*)(_dst + 6 * _dst_ystride) = g;
*(__m64*)(_dst + 7 * _dst_ystride) = h;
}
void oc_state_frag_recon_mmx(oc_theora_state *_state,const oc_fragment *_frag,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,int _ncoefs,
ogg_uint16_t _dc_iquant,const ogg_uint16_t _ac_iquant[64]){
ogg_int16_t __declspec(align(16)) res_buf[64];
int dst_framei;
int dst_ystride;
int zzi;
/*_last_zzi is subtly different from an actual count of the number of
coefficients we decoded for this block.
It contains the value of zzi BEFORE the final token in the block was
decoded.
In most cases this is an EOB token (the continuation of an EOB run from a
previous block counts), and so this is the same as the coefficient count.
However, in the case that the last token was NOT an EOB token, but filled
the block up with exactly 64 coefficients, _last_zzi will be less than 64.
Provided the last token was not a pure zero run, the minimum value it can
be is 46, and so that doesn't affect any of the cases in this routine.
However, if the last token WAS a pure zero run of length 63, then _last_zzi
will be 1 while the number of coefficients decoded is 64.
Thus, we will trigger the following special case, where the real
coefficient count would not.
Note also that a zero run of length 64 will give _last_zzi a value of 0,
but we still process the DC coefficient, which might have a non-zero value
due to DC prediction.
Although convoluted, this is arguably the correct behavior: it allows us to
dequantize fewer coefficients and use a smaller transform when the block
ends with a long zero run instead of a normal EOB token.
It could be smarter... multiple separate zero runs at the end of a block
will fool it, but an encoder that generates these really deserves what it
gets.
Needless to say we inherited this approach from VP3.*/
/*Special case only having a DC component.*/
if(_last_zzi<2){
__m64 p;
/*Why is the iquant product rounded in this case and no others? Who knows.*/
p = _m_from_int((ogg_int32_t)_frag->dc*_dc_iquant+15>>5);
/* broadcast 16 bits into all 4 mmx subregisters */
p = _m_punpcklwd (p,p);
p = _m_punpckldq (p,p);
loc_fill_mmx_value ((__m64 *)res_buf, p);
}
else{
/*Then, fill in the remainder of the coefficients with 0's, and perform
the iDCT.*/
/*First zero the buffer.*/
/*On K7, etc., this could be replaced with movntq and sfence.*/
loc_fill_mmx_value ((__m64 *)res_buf, _mm_setzero_si64());
res_buf[0]=(ogg_int16_t)((ogg_int32_t)_frag->dc*_dc_iquant);
/*This is planned to be rewritten in MMX.*/
for(zzi=1;zzi<_ncoefs;zzi++)
{
int ci;
ci=OC_FZIG_ZAG[zzi];
res_buf[OC_FZIG_ZAGMMX[zzi]]=(ogg_int16_t)((ogg_int32_t)_dct_coeffs[zzi]*
_ac_iquant[ci]);
}
if(_last_zzi<10){
oc_idct8x8_10_mmx(res_buf);
}
else {
oc_idct8x8_mmx(res_buf);
}
}
/*Fill in the target buffer.*/
dst_framei=_state->ref_frame_idx[OC_FRAME_SELF];
dst_ystride=_state->ref_frame_bufs[dst_framei][_pli].stride;
/*For now ystride values in all ref frames assumed to be equal.*/
if(_frag->mbmode==OC_MODE_INTRA){
oc_frag_recon_intra_mmx(_frag->buffer[dst_framei],dst_ystride,res_buf);
}
else{
int ref_framei;
int ref_ystride;
int mvoffsets[2];
ref_framei=_state->ref_frame_idx[OC_FRAME_FOR_MODE[_frag->mbmode]];
ref_ystride=_state->ref_frame_bufs[ref_framei][_pli].stride;
if(oc_state_get_mv_offsets(_state,mvoffsets,_frag->mv[0],
_frag->mv[1],ref_ystride,_pli)>1){
oc_frag_recon_inter2_mmx(_frag->buffer[dst_framei],dst_ystride,
_frag->buffer[ref_framei]+mvoffsets[0],ref_ystride,
_frag->buffer[ref_framei]+mvoffsets[1],ref_ystride,res_buf);
}
else{
oc_frag_recon_inter_mmx(_frag->buffer[dst_framei],dst_ystride,
_frag->buffer[ref_framei]+mvoffsets[0],ref_ystride,res_buf);
}
}
_mm_empty();
}
void oc_state_frag_copy_mmx(const oc_theora_state *_state,const int *_fragis,
int _nfragis,int _dst_frame,int _src_frame,int _pli){
const int *fragi;
const int *fragi_end;
int dst_framei;
int dst_ystride;
int src_framei;
int src_ystride;
dst_framei=_state->ref_frame_idx[_dst_frame];
src_framei=_state->ref_frame_idx[_src_frame];
dst_ystride=_state->ref_frame_bufs[dst_framei][_pli].stride;
src_ystride=_state->ref_frame_bufs[src_framei][_pli].stride;
fragi_end=_fragis+_nfragis;
for(fragi=_fragis;fragi<fragi_end;fragi++){
oc_fragment *frag = _state->frags+*fragi;
loc_blockcopy_mmx (frag->buffer[dst_framei], dst_ystride,
frag->buffer[src_framei], src_ystride);
}
_m_empty();
}
#endif

View file

@ -1,49 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: x86int.h 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
#if !defined(_x86_x86int_vc_H)
# define _x86_x86int_vc_H (1)
# include "../../internal.h"
void oc_state_vtable_init_x86(oc_theora_state *_state);
void oc_frag_recon_intra_mmx(unsigned char *_dst,int _dst_ystride,
const ogg_int16_t *_residue);
void oc_frag_recon_inter_mmx(unsigned char *_dst,int _dst_ystride,
const unsigned char *_src,int _src_ystride,const ogg_int16_t *_residue);
void oc_frag_recon_inter2_mmx(unsigned char *_dst,int _dst_ystride,
const unsigned char *_src1,int _src1_ystride,const unsigned char *_src2,
int _src2_ystride,const ogg_int16_t *_residue);
void oc_state_frag_copy_mmx(const oc_theora_state *_state,const int *_fragis,
int _nfragis,int _dst_frame,int _src_frame,int _pli);
void oc_restore_fpu_mmx(void);
void oc_state_frag_recon_mmx(oc_theora_state *_state,const oc_fragment *_frag,
int _pli,ogg_int16_t _dct_coeffs[128],int _last_zzi,int _ncoefs,
ogg_uint16_t _dc_iquant,const ogg_uint16_t _ac_iquant[64]);
void oc_idct8x8_mmx(ogg_int16_t _y[64]);
void oc_idct8x8_10_mmx(ogg_int16_t _y[64]);
void oc_state_loop_filter_frag_rows_mmx(oc_theora_state *_state,int *_bv,
int _refi,int _pli,int _fragy0,int _fragy_end);
#endif

View file

@ -5,7 +5,7 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
@ -19,6 +19,7 @@
#include <string.h>
#include <limits.h>
#include "apiwrapper.h"
#include "decint.h"
#include "theora/theoradec.h"
static void th_dec_api_clear(th_api_wrapper *_api){
@ -47,7 +48,7 @@ static double theora_decode_granule_time(theora_state *_td,ogg_int64_t _gp){
return th_granule_time(((th_api_wrapper *)_td->i->codec_setup)->decode,_gp);
}
static const oc_state_dispatch_vtbl OC_DEC_DISPATCH_VTBL={
static const oc_state_dispatch_vtable OC_DEC_DISPATCH_VTBL={
(oc_state_clear_func)theora_decode_clear,
(oc_state_control_func)theora_decode_control,
(oc_state_granule_frame_func)theora_decode_granule_frame,
@ -95,6 +96,7 @@ int theora_decode_init(theora_state *_td,theora_info *_ci){
This avoids having to figure out whether or not we need to free the info
struct in either theora_info_clear() or theora_clear().*/
apiinfo=(th_api_info *)_ogg_calloc(1,sizeof(*apiinfo));
if(apiinfo==NULL)return OC_FAULT;
/*Make our own copy of the info struct, since its lifetime should be
independent of the one we were passed in.*/
*&apiinfo->info=*_ci;
@ -130,6 +132,7 @@ int theora_decode_header(theora_info *_ci,theora_comment *_cc,ogg_packet *_op){
theora_info struct like the ones that are used in a theora_state struct.*/
if(api==NULL){
_ci->codec_setup=_ogg_calloc(1,sizeof(*api));
if(_ci->codec_setup==NULL)return OC_FAULT;
api=(th_api_wrapper *)_ci->codec_setup;
api->clear=(oc_setup_clear_func)th_dec_api_clear;
}
@ -167,12 +170,14 @@ int theora_decode_packetin(theora_state *_td,ogg_packet *_op){
int theora_decode_YUVout(theora_state *_td,yuv_buffer *_yuv){
th_api_wrapper *api;
th_dec_ctx *decode;
th_ycbcr_buffer buf;
int ret;
if(!_td||!_td->i||!_td->i->codec_setup)return OC_FAULT;
api=(th_api_wrapper *)_td->i->codec_setup;
if(!api->decode)return OC_FAULT;
ret=th_decode_ycbcr_out(api->decode,buf);
decode=(th_dec_ctx *)api->decode;
if(!decode)return OC_FAULT;
ret=th_decode_ycbcr_out(decode,buf);
if(ret>=0){
_yuv->y_width=buf[0].width;
_yuv->y_height=buf[0].height;

View file

@ -5,13 +5,13 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: decinfo.c 15400 2008-10-15 12:10:58Z tterribe $
last mod: $Id: decinfo.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
@ -27,30 +27,30 @@
_opb: The pack buffer to read the octets from.
_buf: The byte array to store the unpacked bytes in.
_len: The number of octets to unpack.*/
static void oc_unpack_octets(oggpack_buffer *_opb,char *_buf,size_t _len){
static void oc_unpack_octets(oc_pack_buf *_opb,char *_buf,size_t _len){
while(_len-->0){
long val;
theorapackB_read(_opb,8,&val);
val=oc_pack_read(_opb,8);
*_buf++=(char)val;
}
}
/*Unpacks a 32-bit integer encoded by octets in little-endian form.*/
static long oc_unpack_length(oggpack_buffer *_opb){
static long oc_unpack_length(oc_pack_buf *_opb){
long ret[4];
int i;
for(i=0;i<4;i++)theorapackB_read(_opb,8,ret+i);
for(i=0;i<4;i++)ret[i]=oc_pack_read(_opb,8);
return ret[0]|ret[1]<<8|ret[2]<<16|ret[3]<<24;
}
static int oc_info_unpack(oggpack_buffer *_opb,th_info *_info){
static int oc_info_unpack(oc_pack_buf *_opb,th_info *_info){
long val;
/*Check the codec bitstream version.*/
theorapackB_read(_opb,8,&val);
val=oc_pack_read(_opb,8);
_info->version_major=(unsigned char)val;
theorapackB_read(_opb,8,&val);
val=oc_pack_read(_opb,8);
_info->version_minor=(unsigned char)val;
theorapackB_read(_opb,8,&val);
val=oc_pack_read(_opb,8);
_info->version_subminor=(unsigned char)val;
/*verify we can parse this bitstream version.
We accept earlier minors and all subminors, by spec*/
@ -60,25 +60,21 @@ static int oc_info_unpack(oggpack_buffer *_opb,th_info *_info){
return TH_EVERSION;
}
/*Read the encoded frame description.*/
theorapackB_read(_opb,16,&val);
val=oc_pack_read(_opb,16);
_info->frame_width=(ogg_uint32_t)val<<4;
theorapackB_read(_opb,16,&val);
val=oc_pack_read(_opb,16);
_info->frame_height=(ogg_uint32_t)val<<4;
theorapackB_read(_opb,24,&val);
val=oc_pack_read(_opb,24);
_info->pic_width=(ogg_uint32_t)val;
theorapackB_read(_opb,24,&val);
val=oc_pack_read(_opb,24);
_info->pic_height=(ogg_uint32_t)val;
theorapackB_read(_opb,8,&val);
val=oc_pack_read(_opb,8);
_info->pic_x=(ogg_uint32_t)val;
/*Note: The sense of pic_y is inverted in what we pass back to the
application compared to how it is stored in the bitstream.
This is because the bitstream uses a right-handed coordinate system, while
applications expect a left-handed one.*/
theorapackB_read(_opb,8,&val);
_info->pic_y=_info->frame_height-_info->pic_height-(ogg_uint32_t)val;
theorapackB_read(_opb,32,&val);
val=oc_pack_read(_opb,8);
_info->pic_y=(ogg_uint32_t)val;
val=oc_pack_read(_opb,32);
_info->fps_numerator=(ogg_uint32_t)val;
theorapackB_read(_opb,32,&val);
val=oc_pack_read(_opb,32);
_info->fps_denominator=(ogg_uint32_t)val;
if(_info->frame_width==0||_info->frame_height==0||
_info->pic_width+_info->pic_x>_info->frame_width||
@ -86,38 +82,46 @@ static int oc_info_unpack(oggpack_buffer *_opb,th_info *_info){
_info->fps_numerator==0||_info->fps_denominator==0){
return TH_EBADHEADER;
}
theorapackB_read(_opb,24,&val);
/*Note: The sense of pic_y is inverted in what we pass back to the
application compared to how it is stored in the bitstream.
This is because the bitstream uses a right-handed coordinate system, while
applications expect a left-handed one.*/
_info->pic_y=_info->frame_height-_info->pic_height-_info->pic_y;
val=oc_pack_read(_opb,24);
_info->aspect_numerator=(ogg_uint32_t)val;
theorapackB_read(_opb,24,&val);
val=oc_pack_read(_opb,24);
_info->aspect_denominator=(ogg_uint32_t)val;
theorapackB_read(_opb,8,&val);
val=oc_pack_read(_opb,8);
_info->colorspace=(th_colorspace)val;
theorapackB_read(_opb,24,&val);
val=oc_pack_read(_opb,24);
_info->target_bitrate=(int)val;
theorapackB_read(_opb,6,&val);
val=oc_pack_read(_opb,6);
_info->quality=(int)val;
theorapackB_read(_opb,5,&val);
val=oc_pack_read(_opb,5);
_info->keyframe_granule_shift=(int)val;
theorapackB_read(_opb,2,&val);
val=oc_pack_read(_opb,2);
_info->pixel_fmt=(th_pixel_fmt)val;
if(_info->pixel_fmt==TH_PF_RSVD)return TH_EBADHEADER;
if(theorapackB_read(_opb,3,&val)<0||val!=0)return TH_EBADHEADER;
val=oc_pack_read(_opb,3);
if(val!=0||oc_pack_bytes_left(_opb)<0)return TH_EBADHEADER;
return 0;
}
static int oc_comment_unpack(oggpack_buffer *_opb,th_comment *_tc){
static int oc_comment_unpack(oc_pack_buf *_opb,th_comment *_tc){
long len;
int i;
/*Read the vendor string.*/
len=oc_unpack_length(_opb);
if(len<0||theorapackB_bytes(_opb)+len>_opb->storage)return TH_EBADHEADER;
if(len<0||len>oc_pack_bytes_left(_opb))return TH_EBADHEADER;
_tc->vendor=_ogg_malloc((size_t)len+1);
if(_tc->vendor==NULL)return TH_EFAULT;
oc_unpack_octets(_opb,_tc->vendor,len);
_tc->vendor[len]='\0';
/*Read the user comments.*/
_tc->comments=(int)oc_unpack_length(_opb);
if(_tc->comments<0||_tc->comments>(LONG_MAX>>2)||
theorapackB_bytes(_opb)+((long)_tc->comments<<2)>_opb->storage){
len=_tc->comments;
if(len<0||len>(LONG_MAX>>2)||len<<2>oc_pack_bytes_left(_opb)){
_tc->comments=0;
return TH_EBADHEADER;
}
_tc->comment_lengths=(int *)_ogg_malloc(
@ -126,19 +130,23 @@ static int oc_comment_unpack(oggpack_buffer *_opb,th_comment *_tc){
_tc->comments*sizeof(_tc->user_comments[0]));
for(i=0;i<_tc->comments;i++){
len=oc_unpack_length(_opb);
if(len<0||theorapackB_bytes(_opb)+len>_opb->storage){
if(len<0||len>oc_pack_bytes_left(_opb)){
_tc->comments=i;
return TH_EBADHEADER;
}
_tc->comment_lengths[i]=len;
_tc->user_comments[i]=_ogg_malloc((size_t)len+1);
if(_tc->user_comments[i]==NULL){
_tc->comments=i;
return TH_EFAULT;
}
oc_unpack_octets(_opb,_tc->user_comments[i],len);
_tc->user_comments[i][len]='\0';
}
return theorapackB_read(_opb,0,&len)<0?TH_EBADHEADER:0;
return oc_pack_bytes_left(_opb)<0?TH_EBADHEADER:0;
}
static int oc_setup_unpack(oggpack_buffer *_opb,th_setup_info *_setup){
static int oc_setup_unpack(oc_pack_buf *_opb,th_setup_info *_setup){
int ret;
/*Read the quantizer tables.*/
ret=oc_quant_params_unpack(_opb,&_setup->qinfo);
@ -152,13 +160,13 @@ static void oc_setup_clear(th_setup_info *_setup){
oc_huff_trees_clear(_setup->huff_tables);
}
static int oc_dec_headerin(oggpack_buffer *_opb,th_info *_info,
static int oc_dec_headerin(oc_pack_buf *_opb,th_info *_info,
th_comment *_tc,th_setup_info **_setup,ogg_packet *_op){
char buffer[6];
long val;
int packtype;
int ret;
theorapackB_read(_opb,8,&val);
val=oc_pack_read(_opb,8);
packtype=(int)val;
/*If we're at a data packet and we have received all three headers, we're
done.*/
@ -198,6 +206,7 @@ static int oc_dec_headerin(oggpack_buffer *_opb,th_info *_info,
return TH_EBADHEADER;
}
setup=(oc_setup_info *)_ogg_calloc(1,sizeof(*setup));
if(setup==NULL)return TH_EFAULT;
ret=oc_setup_unpack(_opb,setup);
if(ret<0){
oc_setup_clear(setup);
@ -222,13 +231,11 @@ static int oc_dec_headerin(oggpack_buffer *_opb,th_info *_info,
stream until it returns 0.*/
int th_decode_headerin(th_info *_info,th_comment *_tc,
th_setup_info **_setup,ogg_packet *_op){
oggpack_buffer opb;
int ret;
oc_pack_buf opb;
if(_op==NULL)return TH_EBADHEADER;
if(_info==NULL)return TH_EFAULT;
theorapackB_readinit(&opb,_op->packet,_op->bytes);
ret=oc_dec_headerin(&opb,_info,_tc,_setup,_op);
return ret;
oc_pack_readinit(&opb,_op->packet,_op->bytes);
return oc_dec_headerin(&opb,_info,_tc,_setup,_op);
}
void th_setup_free(th_setup_info *_setup){

View file

@ -5,13 +5,13 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: decint.h 15400 2008-10-15 12:10:58Z tterribe $
last mod: $Id: decint.h 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
@ -19,13 +19,12 @@
#if !defined(_decint_H)
# define _decint_H (1)
# include "theora/theoradec.h"
# include "../internal.h"
# include "internal.h"
# include "bitpack.h"
typedef struct th_setup_info oc_setup_info;
typedef struct th_dec_ctx oc_dec_ctx;
# include "idct.h"
# include "huffdec.h"
# include "dequant.h"
@ -54,24 +53,20 @@ struct th_dec_ctx{
when a frame has been processed and a data packet is ready.*/
int packet_state;
/*Buffer in which to assemble packets.*/
oggpack_buffer opb;
oc_pack_buf opb;
/*Huffman decode trees.*/
oc_huff_node *huff_tables[TH_NHUFFMAN_TABLES];
/*The index of one past the last token in each plane for each coefficient.
The final entries are the total number of tokens for each coefficient.*/
int ti0[3][64];
/*The index of one past the last extra bits entry in each plane for each
coefficient.
The final entries are the total number of extra bits entries for each
coefficient.*/
int ebi0[3][64];
/*The index of the first token in each plane for each coefficient.*/
ptrdiff_t ti0[3][64];
/*The number of outstanding EOB runs at the start of each coefficient in each
plane.*/
int eob_runs[3][64];
ptrdiff_t eob_runs[3][64];
/*The DCT token lists.*/
unsigned char **dct_tokens;
unsigned char *dct_tokens;
/*The extra bits associated with DCT tokens.*/
ogg_uint16_t **extra_bits;
unsigned char *extra_bits;
/*The number of dct tokens unpacked so far.*/
int dct_tokens_count;
/*The out-of-loop post-processing level.*/
int pp_level;
/*The DC scale used for out-of-loop deblocking.*/
@ -85,11 +80,28 @@ struct th_dec_ctx{
/*The storage for the post-processed frame buffer.*/
unsigned char *pp_frame_data;
/*Whether or not the post-processsed frame buffer has space for chroma.*/
int pp_frame_has_chroma;
/*The buffer used for the post-processed frame.*/
int pp_frame_state;
/*The buffer used for the post-processed frame.
Note that this is _not_ guaranteed to have the same strides and offsets as
the reference frame buffers.*/
th_ycbcr_buffer pp_frame_buf;
/*The striped decode callback function.*/
th_stripe_callback stripe_cb;
# if defined(HAVE_CAIRO)
/*Output metrics for debugging.*/
int telemetry;
int telemetry_mbmode;
int telemetry_mv;
int telemetry_qi;
int telemetry_bits;
int telemetry_frame_bytes;
int telemetry_coding_bytes;
int telemetry_mode_bytes;
int telemetry_mv_bytes;
int telemetry_qi_bytes;
int telemetry_dc_bytes;
unsigned char *telemetry_frame_data;
# endif
};
#endif

File diff suppressed because it is too large Load diff

View file

@ -5,13 +5,13 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dequant.c 15400 2008-10-15 12:10:58Z tterribe $
last mod: $Id: dequant.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
@ -21,8 +21,7 @@
#include "dequant.h"
#include "decint.h"
int oc_quant_params_unpack(oggpack_buffer *_opb,
th_quant_info *_qinfo){
int oc_quant_params_unpack(oc_pack_buf *_opb,th_quant_info *_qinfo){
th_quant_base *base_mats;
long val;
int nbase_mats;
@ -36,30 +35,31 @@ int oc_quant_params_unpack(oggpack_buffer *_opb,
int qri;
int qi;
int i;
theorapackB_read(_opb,3,&val);
val=oc_pack_read(_opb,3);
nbits=(int)val;
for(qi=0;qi<64;qi++){
theorapackB_read(_opb,nbits,&val);
val=oc_pack_read(_opb,nbits);
_qinfo->loop_filter_limits[qi]=(unsigned char)val;
}
theorapackB_read(_opb,4,&val);
val=oc_pack_read(_opb,4);
nbits=(int)val+1;
for(qi=0;qi<64;qi++){
theorapackB_read(_opb,nbits,&val);
val=oc_pack_read(_opb,nbits);
_qinfo->ac_scale[qi]=(ogg_uint16_t)val;
}
theorapackB_read(_opb,4,&val);
val=oc_pack_read(_opb,4);
nbits=(int)val+1;
for(qi=0;qi<64;qi++){
theorapackB_read(_opb,nbits,&val);
val=oc_pack_read(_opb,nbits);
_qinfo->dc_scale[qi]=(ogg_uint16_t)val;
}
theorapackB_read(_opb,9,&val);
val=oc_pack_read(_opb,9);
nbase_mats=(int)val+1;
base_mats=_ogg_malloc(nbase_mats*sizeof(base_mats[0]));
if(base_mats==NULL)return TH_EFAULT;
for(bmi=0;bmi<nbase_mats;bmi++){
for(ci=0;ci<64;ci++){
theorapackB_read(_opb,8,&val);
val=oc_pack_read(_opb,8);
base_mats[bmi][ci]=(unsigned char)val;
}
}
@ -72,12 +72,12 @@ int oc_quant_params_unpack(oggpack_buffer *_opb,
pli=i%3;
qranges=_qinfo->qi_ranges[qti]+pli;
if(i>0){
theorapackB_read1(_opb,&val);
val=oc_pack_read1(_opb);
if(!val){
int qtj;
int plj;
if(qti>0){
theorapackB_read1(_opb,&val);
val=oc_pack_read1(_opb);
if(val){
qtj=qti-1;
plj=pli;
@ -95,13 +95,13 @@ int oc_quant_params_unpack(oggpack_buffer *_opb,
continue;
}
}
theorapackB_read(_opb,nbits,&val);
val=oc_pack_read(_opb,nbits);
indices[0]=(int)val;
for(qi=qri=0;qi<63;){
theorapackB_read(_opb,oc_ilog(62-qi),&val);
val=oc_pack_read(_opb,oc_ilog(62-qi));
sizes[qri]=(int)val+1;
qi+=(int)val+1;
theorapackB_read(_opb,nbits,&val);
val=oc_pack_read(_opb,nbits);
indices[++qri]=(int)val;
}
/*Note: The caller is responsible for cleaning up any partially
@ -112,8 +112,20 @@ int oc_quant_params_unpack(oggpack_buffer *_opb,
}
qranges->nranges=qri;
qranges->sizes=qrsizes=(int *)_ogg_malloc(qri*sizeof(qrsizes[0]));
if(qranges->sizes==NULL){
/*Note: The caller is responsible for cleaning up any partially
constructed qinfo.*/
_ogg_free(base_mats);
return TH_EFAULT;
}
memcpy(qrsizes,sizes,qri*sizeof(qrsizes[0]));
qrbms=(th_quant_base *)_ogg_malloc((qri+1)*sizeof(qrbms[0]));
if(qrbms==NULL){
/*Note: The caller is responsible for cleaning up any partially
constructed qinfo.*/
_ogg_free(base_mats);
return TH_EFAULT;
}
qranges->base_matrices=(const th_quant_base *)qrbms;
do{
bmi=indices[qri];

View file

@ -5,21 +5,22 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dequant.h 15400 2008-10-15 12:10:58Z tterribe $
last mod: $Id: dequant.h 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
#if !defined(_dequant_H)
# define _dequant_H (1)
# include "quant.h"
# include "bitpack.h"
int oc_quant_params_unpack(oggpack_buffer *_opb,
int oc_quant_params_unpack(oc_pack_buf *_opb,
th_quant_info *_qinfo);
void oc_quant_params_clear(th_quant_info *_qinfo);

View file

@ -1,37 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: block_inline.h 14059 2007-10-28 23:43:27Z xiphmont $
********************************************************************/
#include "codec_internal.h"
static const ogg_int32_t MBOrderMap[4] = { 0, 2, 3, 1 };
static const ogg_int32_t BlockOrderMap1[4][4] = {
{ 0, 1, 3, 2 },
{ 0, 2, 3, 1 },
{ 0, 2, 3, 1 },
{ 3, 2, 0, 1 }
};
static ogg_int32_t QuadMapToIndex1( ogg_int32_t (*BlockMap)[4][4],
ogg_uint32_t SB, ogg_uint32_t MB,
ogg_uint32_t B ){
return BlockMap[SB][MBOrderMap[MB]][BlockOrderMap1[MB][B]];
}
static ogg_int32_t QuadMapToMBTopLeft( ogg_int32_t (*BlockMap)[4][4],
ogg_uint32_t SB, ogg_uint32_t MB ){
return BlockMap[SB][MBOrderMap[MB]][0];
}

View file

@ -1,99 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: blockmap.c 14059 2007-10-28 23:43:27Z xiphmont $
********************************************************************/
#include "codec_internal.h"
static void CreateMapping ( ogg_int32_t (*BlockMap)[4][4],
ogg_uint32_t FirstSB,
ogg_uint32_t FirstFrag, ogg_uint32_t HFrags,
ogg_uint32_t VFrags ){
ogg_uint32_t i, j = 0;
ogg_uint32_t xpos;
ogg_uint32_t ypos;
ogg_uint32_t SBrow, SBcol;
ogg_uint32_t SBRows, SBCols;
ogg_uint32_t MB, B;
ogg_uint32_t SB=FirstSB;
ogg_uint32_t FragIndex=FirstFrag;
/* Set Super-Block dimensions */
SBRows = VFrags/4 + ( VFrags%4 ? 1 : 0 );
SBCols = HFrags/4 + ( HFrags%4 ? 1 : 0 );
/* Map each Super-Block */
for ( SBrow=0; SBrow<SBRows; SBrow++ ){
for ( SBcol=0; SBcol<SBCols; SBcol++ ){
/* Y co-ordinate of Super-Block in Block units */
ypos = SBrow<<2;
/* Map Blocks within this Super-Block */
for ( i=0; (i<4) && (ypos<VFrags); i++, ypos++ ){
/* X co-ordinate of Super-Block in Block units */
xpos = SBcol<<2;
for ( j=0; (j<4) && (xpos<HFrags); j++, xpos++ ){
if ( i<2 ){
MB = ( j<2 ? 0 : 1 );
}else{
MB = ( j<2 ? 2 : 3 );
}
if ( i%2 ){
B = ( j%2 ? 3 : 2 );
}else{
B = ( j%2 ? 1 : 0 );
}
/* Set mapping and move to next fragment */
BlockMap[SB][MB][B] = FragIndex++;
}
/* Move to first fragment in next row in Super-Block */
FragIndex += HFrags-j;
}
/* Move on to next Super-Block */
SB++;
FragIndex -= i*HFrags-j;
}
/* Move to first Super-Block in next row */
FragIndex += 3*HFrags;
}
}
void CreateBlockMapping ( ogg_int32_t (*BlockMap)[4][4],
ogg_uint32_t YSuperBlocks,
ogg_uint32_t UVSuperBlocks,
ogg_uint32_t HFrags, ogg_uint32_t VFrags ) {
ogg_uint32_t i, j;
for ( i=0; i<YSuperBlocks + UVSuperBlocks * 2; i++ ){
for ( j=0; j<4; j++ ) {
BlockMap[i][j][0] = -1;
BlockMap[i][j][1] = -1;
BlockMap[i][j][2] = -1;
BlockMap[i][j][3] = -1;
}
}
CreateMapping ( BlockMap, 0, 0, HFrags, VFrags );
CreateMapping ( BlockMap, YSuperBlocks, HFrags*VFrags, HFrags/2, VFrags/2 );
CreateMapping ( BlockMap, YSuperBlocks + UVSuperBlocks, (HFrags*VFrags*5)/4,
HFrags/2, VFrags/2 );
}

View file

@ -1,842 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2005 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: codec_internal.h 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
#ifndef ENCODER_INTERNAL_H
#define ENCODER_INTERNAL_H
#ifdef HAVE_CONFIG_H
# include "config.h"
#endif
typedef struct PB_INSTANCE PB_INSTANCE;
#include "dsp.h"
#include "theora/theora.h"
#include "encoder_huffman.h"
#define theora_read(x,y,z) ( *z = oggpackB_read(x,y) )
#define CURRENT_ENCODE_VERSION 1
#define HUGE_ERROR (1<<28) /* Out of range test value */
/* Baseline dct height and width. */
#define BLOCK_HEIGHT_WIDTH 8
#define HFRAGPIXELS 8
#define VFRAGPIXELS 8
/* Blocks on INTRA/INTER Y/U/V planes */
enum BlockMode {
BLOCK_Y,
BLOCK_U,
BLOCK_V,
BLOCK_INTER_Y,
BLOCK_INTER_U,
BLOCK_INTER_V
};
/* Baseline dct block size */
#define BLOCK_SIZE (BLOCK_HEIGHT_WIDTH * BLOCK_HEIGHT_WIDTH)
/* Border is for unrestricted mv's */
#define UMV_BORDER 16
#define STRIDE_EXTRA (UMV_BORDER * 2)
#define Q_TABLE_SIZE 64
#define KEY_FRAME 0
#define DELTA_FRAME 1
#define MAX_MODES 8
#define MODE_BITS 3
#define MODE_METHODS 8
#define MODE_METHOD_BITS 3
/* Different key frame types/methods */
#define DCT_KEY_FRAME 0
#define KEY_FRAME_CONTEXT 5
/* Preprocessor defines */
#define MAX_PREV_FRAMES 16
/* Number of search sites for a 4-step search (at pixel accuracy) */
#define MAX_SEARCH_SITES 33
#define VERY_BEST_Q 10
#define MIN_BPB_FACTOR 0.3
#define MAX_BPB_FACTOR 3.0
#define MAX_MV_EXTENT 31 /* Max search distance in half pixel increments */
typedef struct CONFIG_TYPE2{
double OutputFrameRate;
ogg_uint32_t TargetBandwidth;
ogg_uint32_t KeyFrameDataTarget ; /* Data rate target for key frames */
ogg_uint32_t FirstFrameQ;
ogg_uint32_t BaseQ;
ogg_uint32_t MaxQ; /* Absolute Max Q allowed. */
ogg_uint32_t ActiveMaxQ; /* Currently active Max Q */
} CONFIG_TYPE2;
typedef struct coeffNode{
int i;
struct coeffNode *next;
} COEFFNODE;
typedef struct{
unsigned char * Yuv0ptr;
unsigned char * Yuv1ptr;
unsigned char * SrfWorkSpcPtr;
unsigned char * disp_fragments;
ogg_uint32_t * RegionIndex; /* Gives pixel index for top left of
each block */
ogg_uint32_t VideoFrameHeight;
ogg_uint32_t VideoFrameWidth;
} SCAN_CONFIG_DATA;
typedef unsigned char YUV_BUFFER_ENTRY;
typedef struct{
ogg_int32_t x;
ogg_int32_t y;
} MOTION_VECTOR;
typedef MOTION_VECTOR COORDINATE;
/** Quantizer matrix entry */
typedef ogg_int16_t Q_LIST_ENTRY;
/** Decode Post-Processor instance */
typedef struct PP_INSTANCE {
ogg_uint32_t PrevFrameLimit;
ogg_uint32_t *ScanPixelIndexTable;
signed char *ScanDisplayFragments;
signed char *PrevFragments[MAX_PREV_FRAMES];
ogg_uint32_t *FragScores; /* The individual frame difference ratings. */
signed char *SameGreyDirPixels;
signed char *BarBlockMap;
/* Number of pixels changed by diff threshold in row of a fragment. */
unsigned char *FragDiffPixels;
unsigned char *PixelScores;
unsigned char *PixelChangedMap;
unsigned char *ChLocals;
ogg_int16_t *yuv_differences;
ogg_int32_t *RowChangedPixels;
signed char *TmpCodedMap;
/* Plane pointers and dimension variables */
unsigned char * YPlanePtr0;
unsigned char * YPlanePtr1;
unsigned char * UPlanePtr0;
unsigned char * UPlanePtr1;
unsigned char * VPlanePtr0;
unsigned char * VPlanePtr1;
ogg_uint32_t VideoYPlaneWidth;
ogg_uint32_t VideoYPlaneHeight;
ogg_uint32_t VideoUVPlaneWidth;
ogg_uint32_t VideoUVPlaneHeight;
ogg_uint32_t VideoYPlaneStride;
ogg_uint32_t VideoUPlaneStride;
ogg_uint32_t VideoVPlaneStride;
/* Scan control variables. */
unsigned char HFragPixels;
unsigned char VFragPixels;
ogg_uint32_t ScanFrameFragments;
ogg_uint32_t ScanYPlaneFragments;
ogg_uint32_t ScanUVPlaneFragments;
ogg_uint32_t ScanHFragments;
ogg_uint32_t ScanVFragments;
ogg_uint32_t YFramePixels;
ogg_uint32_t UVFramePixels;
ogg_uint32_t SgcThresh;
ogg_uint32_t OutputBlocksUpdated;
ogg_uint32_t KFIndicator;
/* The pre-processor scan configuration. */
SCAN_CONFIG_DATA ScanConfig;
ogg_int32_t SRFGreyThresh;
ogg_int32_t SRFColThresh;
ogg_int32_t SgcLevelThresh;
ogg_int32_t SuvcLevelThresh;
ogg_uint32_t NoiseSupLevel;
/* Block Thresholds. */
ogg_uint32_t PrimaryBlockThreshold;
unsigned char LineSearchTripTresh;
int PAKEnabled;
int LevelThresh;
int NegLevelThresh;
int SrfThresh;
int NegSrfThresh;
int HighChange;
int NegHighChange;
/* Threshold lookup tables */
unsigned char SrfPakThreshTable[512];
unsigned char SrfThreshTable[512];
unsigned char SgcThreshTable[512];
/* Variables controlling S.A.D. break outs. */
ogg_uint32_t GrpLowSadThresh;
ogg_uint32_t GrpHighSadThresh;
ogg_uint32_t ModifiedGrpLowSadThresh;
ogg_uint32_t ModifiedGrpHighSadThresh;
ogg_int32_t PlaneHFragments;
ogg_int32_t PlaneVFragments;
ogg_int32_t PlaneHeight;
ogg_int32_t PlaneWidth;
ogg_int32_t PlaneStride;
ogg_uint32_t BlockThreshold;
ogg_uint32_t BlockSgcThresh;
double UVBlockThreshCorrection;
double UVSgcCorrection;
double YUVPlaneCorrectionFactor;
double AbsDiff_ScoreMultiplierTable[256];
unsigned char NoiseScoreBoostTable[256];
unsigned char MaxLineSearchLen;
ogg_int32_t YuvDiffsCircularBufferSize;
ogg_int32_t ChLocalsCircularBufferSize;
ogg_int32_t PixelMapCircularBufferSize;
DspFunctions dsp; /* Selected functions for this platform */
} PP_INSTANCE;
/** block coding modes */
typedef enum{
CODE_INTER_NO_MV = 0x0, /* INTER prediction, (0,0) motion
vector implied. */
CODE_INTRA = 0x1, /* INTRA i.e. no prediction. */
CODE_INTER_PLUS_MV = 0x2, /* INTER prediction, non zero motion
vector. */
CODE_INTER_LAST_MV = 0x3, /* Use Last Motion vector */
CODE_INTER_PRIOR_LAST = 0x4, /* Prior last motion vector */
CODE_USING_GOLDEN = 0x5, /* 'Golden frame' prediction (no MV). */
CODE_GOLDEN_MV = 0x6, /* 'Golden frame' prediction plus MV. */
CODE_INTER_FOURMV = 0x7 /* Inter prediction 4MV per macro block. */
} CODING_MODE;
/** Huffman table entry */
typedef struct HUFF_ENTRY {
struct HUFF_ENTRY *ZeroChild;
struct HUFF_ENTRY *OneChild;
struct HUFF_ENTRY *Previous;
struct HUFF_ENTRY *Next;
ogg_int32_t Value;
ogg_uint32_t Frequency;
} HUFF_ENTRY;
typedef struct qmat_range_table {
int startq, startqi; /* index where this range starts */
Q_LIST_ENTRY *qmat; /* qmat at this range boundary */
} qmat_range_table;
/** codec setup data, maps to the third bitstream header */
typedef struct codec_setup_info {
ogg_uint32_t QThreshTable[Q_TABLE_SIZE];
Q_LIST_ENTRY DcScaleFactorTable[Q_TABLE_SIZE];
int MaxQMatrixIndex;
Q_LIST_ENTRY *qmats;
qmat_range_table *range_table[6];
HUFF_ENTRY *HuffRoot[NUM_HUFF_TABLES];
} codec_setup_info;
/** Decoder (Playback) instance -- installed in a theora_state */
struct PB_INSTANCE {
oggpack_buffer *opb;
theora_info info;
/* flag to indicate if the headers already have been written */
int HeadersWritten;
/* how far do we shift the granulepos to seperate out P frame counts? */
int keyframe_granule_shift;
/***********************************************************************/
/* Decoder and Frame Type Information */
int DecoderErrorCode;
int FramesHaveBeenSkipped;
int PostProcessEnabled;
ogg_uint32_t PostProcessingLevel; /* Perform post processing */
/* Frame Info */
CODING_MODE CodingMode;
unsigned char FrameType;
unsigned char KeyFrameType;
ogg_uint32_t QualitySetting;
ogg_uint32_t FrameQIndex; /* Quality specified as a
table index */
ogg_uint32_t ThisFrameQualityValue; /* Quality value for this frame */
ogg_uint32_t LastFrameQualityValue; /* Last Frame's Quality */
ogg_int32_t CodedBlockIndex; /* Number of Coded Blocks */
ogg_uint32_t CodedBlocksThisFrame; /* Index into coded blocks */
ogg_uint32_t FrameSize; /* The number of bytes in the frame. */
/**********************************************************************/
/* Frame Size & Index Information */
ogg_uint32_t YPlaneSize;
ogg_uint32_t UVPlaneSize;
ogg_uint32_t YStride;
ogg_uint32_t UVStride;
ogg_uint32_t VFragments;
ogg_uint32_t HFragments;
ogg_uint32_t UnitFragments;
ogg_uint32_t YPlaneFragments;
ogg_uint32_t UVPlaneFragments;
ogg_uint32_t ReconYPlaneSize;
ogg_uint32_t ReconUVPlaneSize;
ogg_uint32_t YDataOffset;
ogg_uint32_t UDataOffset;
ogg_uint32_t VDataOffset;
ogg_uint32_t ReconYDataOffset;
ogg_uint32_t ReconUDataOffset;
ogg_uint32_t ReconVDataOffset;
ogg_uint32_t YSuperBlocks; /* Number of SuperBlocks in a Y frame */
ogg_uint32_t UVSuperBlocks; /* Number of SuperBlocks in a U or V frame */
ogg_uint32_t SuperBlocks; /* Total number of SuperBlocks in a
Y,U,V frame */
ogg_uint32_t YSBRows; /* Number of rows of SuperBlocks in a
Y frame */
ogg_uint32_t YSBCols; /* Number of cols of SuperBlocks in a
Y frame */
ogg_uint32_t UVSBRows; /* Number of rows of SuperBlocks in a
U or V frame */
ogg_uint32_t UVSBCols; /* Number of cols of SuperBlocks in a
U or V frame */
ogg_uint32_t MacroBlocks; /* Total number of Macro-Blocks */
/**********************************************************************/
/* Frames */
YUV_BUFFER_ENTRY *ThisFrameRecon;
YUV_BUFFER_ENTRY *GoldenFrame;
YUV_BUFFER_ENTRY *LastFrameRecon;
YUV_BUFFER_ENTRY *PostProcessBuffer;
/**********************************************************************/
/* Fragment Information */
ogg_uint32_t *pixel_index_table; /* start address of first
pixel of fragment in
source */
ogg_uint32_t *recon_pixel_index_table; /* start address of first
pixel in recon buffer */
unsigned char *display_fragments; /* Fragment update map */
unsigned char *skipped_display_fragments;/* whether fragment YUV
Conversion and update is to be
skipped */
ogg_int32_t *CodedBlockList; /* A list of fragment indices for
coded blocks. */
MOTION_VECTOR *FragMVect; /* Frag motion vectors */
ogg_uint32_t *FragTokenCounts; /* Number of tokens per fragment */
ogg_uint32_t (*TokenList)[128]; /* Fragment Token Pointers */
ogg_int32_t *FragmentVariances;
ogg_uint32_t *FragQIndex; /* Fragment Quality used in
PostProcess */
Q_LIST_ENTRY (*PPCoefBuffer)[64]; /* PostProcess Buffer for
coefficients data */
unsigned char *FragCoeffs; /* # of coeffs decoded so far for
fragment */
unsigned char *FragCoefEOB; /* Position of last non 0 coef
within QFragData */
Q_LIST_ENTRY (*QFragData)[64]; /* Fragment Coefficients
Array Pointers */
CODING_MODE *FragCodingMethod; /* coding method for the
fragment */
/***********************************************************************/
/* pointers to addresses used for allocation and deallocation the
others are rounded up to the nearest 32 bytes */
COEFFNODE *_Nodes;
ogg_uint32_t *transIndex; /* ptr to table of
transposed indexes */
/***********************************************************************/
ogg_int32_t bumpLast;
/* Macro Block and SuperBlock Information */
ogg_int32_t (*BlockMap)[4][4]; /* super block + sub macro
block + sub frag ->
FragIndex */
/* Coded flag arrays and counters for them */
unsigned char *SBCodedFlags;
unsigned char *SBFullyFlags;
unsigned char *MBCodedFlags;
unsigned char *MBFullyFlags;
/**********************************************************************/
ogg_uint32_t EOB_Run;
COORDINATE *FragCoordinates;
MOTION_VECTOR MVector;
ogg_int32_t ReconPtr2Offset; /* Offset for second reconstruction
in half pixel MC */
Q_LIST_ENTRY *quantized_list;
ogg_int16_t *ReconDataBuffer;
Q_LIST_ENTRY InvLastIntraDC;
Q_LIST_ENTRY InvLastInterDC;
Q_LIST_ENTRY LastIntraDC;
Q_LIST_ENTRY LastInterDC;
ogg_uint32_t BlocksToDecode; /* Blocks to be decoded this frame */
ogg_uint32_t DcHuffChoice; /* Huffman table selection variables */
unsigned char ACHuffChoice;
ogg_uint32_t QuadMBListIndex;
ogg_int32_t ByteCount;
ogg_uint32_t bit_pattern;
unsigned char bits_so_far;
unsigned char NextBit;
ogg_int32_t BitsLeft;
ogg_int16_t *DequantBuffer;
ogg_int32_t fp_quant_InterUV_coeffs[64];
ogg_int32_t fp_quant_InterUV_round[64];
ogg_int32_t fp_ZeroBinSize_InterUV[64];
ogg_int16_t *TmpReconBuffer;
ogg_int16_t *TmpDataBuffer;
/* Loop filter bounding values */
ogg_int16_t FiltBoundingValue[256];
/* Naming convention for all quant matrices and related data structures:
* Fields containing "Inter" in their name are for Inter frames, the
* rest is Intra. */
/* Dequantiser and rounding tables */
ogg_uint16_t *QThreshTable;
Q_LIST_ENTRY dequant_Y_coeffs[64];
Q_LIST_ENTRY dequant_U_coeffs[64];
Q_LIST_ENTRY dequant_V_coeffs[64];
Q_LIST_ENTRY dequant_InterY_coeffs[64];
Q_LIST_ENTRY dequant_InterU_coeffs[64];
Q_LIST_ENTRY dequant_InterV_coeffs[64];
Q_LIST_ENTRY *dequant_coeffs; /* currently active quantizer */
unsigned int zigzag_index[64];
HUFF_ENTRY *HuffRoot_VP3x[NUM_HUFF_TABLES];
ogg_uint32_t *HuffCodeArray_VP3x[NUM_HUFF_TABLES];
unsigned char *HuffCodeLengthArray_VP3x[NUM_HUFF_TABLES];
const unsigned char *ExtraBitLengths_VP3x;
th_quant_info quant_info;
oc_quant_tables quant_tables[2][3];
/* Quantiser and rounding tables */
/* this is scheduled to be replaced a new mechanism
that will simply reuse the dequantizer information. */
ogg_int32_t fp_quant_Y_coeffs[64]; /* used in reiniting quantizers */
ogg_int32_t fp_quant_U_coeffs[64];
ogg_int32_t fp_quant_V_coeffs[64];
ogg_int32_t fp_quant_Inter_Y_coeffs[64];
ogg_int32_t fp_quant_Inter_U_coeffs[64];
ogg_int32_t fp_quant_Inter_V_coeffs[64];
ogg_int32_t fp_quant_Y_round[64];
ogg_int32_t fp_quant_U_round[64];
ogg_int32_t fp_quant_V_round[64];
ogg_int32_t fp_quant_Inter_Y_round[64];
ogg_int32_t fp_quant_Inter_U_round[64];
ogg_int32_t fp_quant_Inter_V_round[64];
ogg_int32_t fp_ZeroBinSize_Y[64];
ogg_int32_t fp_ZeroBinSize_U[64];
ogg_int32_t fp_ZeroBinSize_V[64];
ogg_int32_t fp_ZeroBinSize_Inter_Y[64];
ogg_int32_t fp_ZeroBinSize_Inter_U[64];
ogg_int32_t fp_ZeroBinSize_Inter_V[64];
ogg_int32_t *fquant_coeffs;
ogg_int32_t *fquant_round;
ogg_int32_t *fquant_ZbSize;
/* Predictor used in choosing entropy table for decoding block patterns. */
unsigned char BlockPatternPredictor;
short Modifier[4][512];
short *ModifierPointer[4];
unsigned char *DataOutputInPtr;
DspFunctions dsp; /* Selected functions for this platform */
};
/* Encoder (Compressor) instance -- installed in a theora_state */
typedef struct CP_INSTANCE {
/*This structure must be first.
It contains entry points accessed by the decoder library's API wrapper, and
is the only assumption that library makes about our internal format.*/
oc_state_dispatch_vtbl dispatch_vtbl;
/* Compressor Configuration */
SCAN_CONFIG_DATA ScanConfig;
CONFIG_TYPE2 Configuration;
int GoldenFrameEnabled;
int InterPrediction;
int MotionCompensation;
ogg_uint32_t LastKeyFrame ;
ogg_int32_t DropCount ;
ogg_int32_t MaxConsDroppedFrames ;
ogg_int32_t DropFrameTriggerBytes;
int DropFrameCandidate;
/* Compressor Statistics */
double TotErrScore;
ogg_int64_t KeyFrameCount; /* Count of key frames. */
ogg_int64_t TotKeyFrameBytes;
ogg_uint32_t LastKeyFrameSize;
ogg_uint32_t PriorKeyFrameSize[KEY_FRAME_CONTEXT];
ogg_uint32_t PriorKeyFrameDistance[KEY_FRAME_CONTEXT];
ogg_int32_t FrameQuality[6];
int DecoderErrorCode; /* Decoder error flag. */
ogg_int32_t ThreshMapThreshold;
ogg_int32_t TotalMotionScore;
ogg_int64_t TotalByteCount;
ogg_int32_t FixedQ;
/* Frame Statistics */
signed char InterCodeCount;
ogg_int64_t CurrentFrame;
ogg_int64_t CarryOver ;
ogg_uint32_t LastFrameSize;
ogg_uint32_t FrameBitCount;
int ThisIsFirstFrame;
int ThisIsKeyFrame;
ogg_int32_t MotionScore;
ogg_uint32_t RegulationBlocks;
ogg_int32_t RecoveryMotionScore;
int RecoveryBlocksAdded ;
double ProportionRecBlocks;
double MaxRecFactor ;
/* Rate Targeting variables. */
ogg_uint32_t ThisFrameTargetBytes;
double BpbCorrectionFactor;
/* Up regulation variables */
ogg_uint32_t FinalPassLastPos; /* Used to regulate a final
unrestricted high quality
pass. */
ogg_uint32_t LastEndSB; /* Where we were in the loop
last time. */
ogg_uint32_t ResidueLastEndSB; /* Where we were in the residue
update loop last time. */
/* Controlling Block Selection */
ogg_uint32_t MVChangeFactor;
ogg_uint32_t FourMvChangeFactor;
ogg_uint32_t MinImprovementForNewMV;
ogg_uint32_t ExhaustiveSearchThresh;
ogg_uint32_t MinImprovementForFourMV;
ogg_uint32_t FourMVThreshold;
/* Module shared data structures. */
ogg_int32_t frame_target_rate;
ogg_int32_t BaseLineFrameTargetRate;
ogg_int32_t min_blocks_per_frame;
ogg_uint32_t tot_bytes_old;
/*********************************************************************/
/* Frames Used in the selecetive convolution filtering of the Y plane. */
unsigned char *ConvDestBuffer;
YUV_BUFFER_ENTRY *yuv0ptr;
YUV_BUFFER_ENTRY *yuv1ptr;
/*********************************************************************/
/*********************************************************************/
/* Token Buffers */
ogg_uint32_t *OptimisedTokenListEb; /* Optimised token list extra bits */
unsigned char *OptimisedTokenList; /* Optimised token list. */
unsigned char *OptimisedTokenListHi; /* Optimised token list huffman
table index */
unsigned char *OptimisedTokenListPl; /* Plane to which the token
belongs Y = 0 or UV = 1 */
ogg_int32_t OptimisedTokenCount; /* Count of Optimized tokens */
ogg_uint32_t RunHuffIndex; /* Huffman table in force at
the start of a run */
ogg_uint32_t RunPlaneIndex; /* The plane (Y=0 UV=1) to
which the first token in
an EOB run belonged. */
ogg_uint32_t TotTokenCount;
ogg_int32_t TokensToBeCoded;
ogg_int32_t TokensCoded;
/********************************************************************/
/* SuperBlock, MacroBLock and Fragment Information */
/* Coded flag arrays and counters for them */
unsigned char *PartiallyCodedFlags;
unsigned char *PartiallyCodedMbPatterns;
unsigned char *UncodedMbFlags;
unsigned char *extra_fragments; /* extra updates not
recommended by pre-processor */
ogg_int16_t *OriginalDC;
ogg_uint32_t *FragmentLastQ; /* Array used to keep track of
quality at which each
fragment was last
updated. */
unsigned char *FragTokens;
ogg_uint32_t *FragTokenCounts; /* Number of tokens per fragment */
ogg_uint32_t *RunHuffIndices;
ogg_uint32_t *LastCodedErrorScore;
ogg_uint32_t *ModeList;
MOTION_VECTOR *MVList;
unsigned char *BlockCodedFlags;
ogg_uint32_t MvListCount;
ogg_uint32_t ModeListCount;
unsigned char *DataOutputBuffer;
/*********************************************************************/
ogg_uint32_t RunLength;
ogg_uint32_t MaxBitTarget; /* Cut off target for rate capping */
double BitRateCapFactor; /* Factor relating delta frame target
to cut off target. */
unsigned char MBCodingMode; /* Coding mode flags */
ogg_int32_t MVPixelOffsetY[MAX_SEARCH_SITES];
ogg_uint32_t InterTripOutThresh;
unsigned char MVEnabled;
ogg_uint32_t MotionVectorSearchCount;
ogg_uint32_t FrameMVSearcOunt;
ogg_int32_t MVSearchSteps;
ogg_int32_t MVOffsetX[MAX_SEARCH_SITES];
ogg_int32_t MVOffsetY[MAX_SEARCH_SITES];
ogg_int32_t HalfPixelRef2Offset[9]; /* Offsets for half pixel
compensation */
signed char HalfPixelXOffset[9]; /* Half pixel MV offsets for X */
signed char HalfPixelYOffset[9]; /* Half pixel MV offsets for Y */
ogg_uint32_t bit_pattern ;
unsigned char bits_so_far ;
ogg_uint32_t lastval ;
ogg_uint32_t lastrun ;
Q_LIST_ENTRY *quantized_list;
MOTION_VECTOR MVector;
ogg_uint32_t TempBitCount;
ogg_int16_t *DCT_codes; /* Buffer that stores the result of
Forward DCT */
ogg_int16_t *DCTDataBuffer; /* Input data buffer for Forward DCT */
/* Motion compensation related variables */
ogg_uint32_t MvMaxExtent;
double QTargetModifier[Q_TABLE_SIZE];
/* instances (used for reconstructing buffers and to hold tokens etc.) */
PP_INSTANCE pp; /* preprocessor */
PB_INSTANCE pb; /* playback */
/* ogg bitpacker for use in packet coding, other API state */
oggpack_buffer *oggbuffer;
int readyflag;
int packetflag;
int doneflag;
DspFunctions dsp; /* Selected functions for this platform */
} CP_INSTANCE;
#define clamp255(x) ((unsigned char)((((x)<0)-1) & ((x) | -((x)>255))))
extern void ConfigurePP( PP_INSTANCE *ppi, int Level ) ;
extern ogg_uint32_t YUVAnalyseFrame( PP_INSTANCE *ppi,
ogg_uint32_t * KFIndicator );
extern void ClearPPInstance(PP_INSTANCE *ppi);
extern void InitPPInstance(PP_INSTANCE *ppi, DspFunctions *funcs);
extern void InitPBInstance(PB_INSTANCE *pbi);
extern void ClearPBInstance(PB_INSTANCE *pbi);
extern void IDct1( Q_LIST_ENTRY * InputData,
ogg_int16_t *QuantMatrix,
ogg_int16_t * OutputData );
extern void ReconIntra( PB_INSTANCE *pbi, unsigned char * ReconPtr,
ogg_int16_t * ChangePtr, ogg_uint32_t LineStep );
extern void ReconInter( PB_INSTANCE *pbi, unsigned char * ReconPtr,
unsigned char * RefPtr, ogg_int16_t * ChangePtr,
ogg_uint32_t LineStep ) ;
extern void ReconInterHalfPixel2( PB_INSTANCE *pbi, unsigned char * ReconPtr,
unsigned char * RefPtr1,
unsigned char * RefPtr2,
ogg_int16_t * ChangePtr,
ogg_uint32_t LineStep ) ;
extern void SetupLoopFilter(PB_INSTANCE *pbi);
extern void CopyBlock(unsigned char *src,
unsigned char *dest,
unsigned int srcstride);
extern void LoopFilter(PB_INSTANCE *pbi);
extern void ReconRefFrames (PB_INSTANCE *pbi);
extern void ExpandToken( Q_LIST_ENTRY * ExpandedBlock,
unsigned char * CoeffIndex, ogg_uint32_t Token,
ogg_int32_t ExtraBits );
extern void ClearDownQFragData(PB_INSTANCE *pbi);
extern void select_quantiser (PB_INSTANCE *pbi, int type);
extern void quantize( PB_INSTANCE *pbi,
ogg_int16_t * DCT_block,
Q_LIST_ENTRY * quantized_list);
extern void UpdateQ( PB_INSTANCE *pbi, int NewQIndex );
extern void UpdateQC( CP_INSTANCE *cpi, ogg_uint32_t NewQ );
extern void fdct_short ( ogg_int16_t * InputData, ogg_int16_t * OutputData );
extern ogg_uint32_t DPCMTokenizeBlock (CP_INSTANCE *cpi,
ogg_int32_t FragIndex);
extern void TransformQuantizeBlock (CP_INSTANCE *cpi, ogg_int32_t FragIndex,
ogg_uint32_t PixelsPerLine ) ;
extern void ClearFragmentInfo(PB_INSTANCE * pbi);
extern void InitFragmentInfo(PB_INSTANCE * pbi);
extern void ClearFrameInfo(PB_INSTANCE * pbi);
extern void InitFrameInfo(PB_INSTANCE * pbi, unsigned int FrameSize);
extern void InitializeFragCoordinates(PB_INSTANCE *pbi);
extern void InitFrameDetails(PB_INSTANCE *pbi);
extern void WriteQTables(PB_INSTANCE *pbi,oggpack_buffer *opb);
extern void InitQTables( PB_INSTANCE *pbi );
extern void quant_tables_init( PB_INSTANCE *pbi, const th_quant_info *qinfo);
extern void InitHuffmanSet( PB_INSTANCE *pbi );
extern void ClearHuffmanSet( PB_INSTANCE *pbi );
extern int ReadHuffmanTrees(codec_setup_info *ci, oggpack_buffer *opb);
extern void WriteHuffmanTrees(HUFF_ENTRY *HuffRoot[NUM_HUFF_TABLES],
oggpack_buffer *opb);
extern void InitHuffmanTrees(PB_INSTANCE *pbi, const codec_setup_info *ci);
extern void ClearHuffmanTrees(HUFF_ENTRY *HuffRoot[NUM_HUFF_TABLES]);
extern int ReadFilterTables(codec_setup_info *ci, oggpack_buffer *opb);
extern void QuadDecodeDisplayFragments ( PB_INSTANCE *pbi );
extern void PackAndWriteDFArray( CP_INSTANCE *cpi );
extern void UpdateFragQIndex(PB_INSTANCE *pbi);
extern void PostProcess(PB_INSTANCE *pbi);
extern void InitMotionCompensation ( CP_INSTANCE *cpi );
extern ogg_uint32_t GetMBIntraError (CP_INSTANCE *cpi, ogg_uint32_t FragIndex,
ogg_uint32_t PixelsPerLine ) ;
extern ogg_uint32_t GetMBInterError (CP_INSTANCE *cpi,
unsigned char * SrcPtr,
unsigned char * RefPtr,
ogg_uint32_t FragIndex,
ogg_int32_t LastXMV,
ogg_int32_t LastYMV,
ogg_uint32_t PixelsPerLine ) ;
extern void WriteFrameHeader( CP_INSTANCE *cpi) ;
extern ogg_uint32_t GetMBMVInterError (CP_INSTANCE *cpi,
unsigned char * RefFramePtr,
ogg_uint32_t FragIndex,
ogg_uint32_t PixelsPerLine,
ogg_int32_t *MVPixelOffset,
MOTION_VECTOR *MV );
extern ogg_uint32_t GetMBMVExhaustiveSearch (CP_INSTANCE *cpi,
unsigned char * RefFramePtr,
ogg_uint32_t FragIndex,
ogg_uint32_t PixelsPerLine,
MOTION_VECTOR *MV );
extern ogg_uint32_t GetFOURMVExhaustiveSearch (CP_INSTANCE *cpi,
unsigned char * RefFramePtr,
ogg_uint32_t FragIndex,
ogg_uint32_t PixelsPerLine,
MOTION_VECTOR *MV ) ;
extern ogg_uint32_t EncodeData(CP_INSTANCE *cpi);
extern ogg_uint32_t PickIntra( CP_INSTANCE *cpi,
ogg_uint32_t SBRows,
ogg_uint32_t SBCols);
extern ogg_uint32_t PickModes(CP_INSTANCE *cpi,
ogg_uint32_t SBRows,
ogg_uint32_t SBCols,
ogg_uint32_t PixelsPerLine,
ogg_uint32_t *InterError,
ogg_uint32_t *IntraError);
extern CODING_MODE FrArrayUnpackMode(PB_INSTANCE *pbi);
extern void CreateBlockMapping ( ogg_int32_t (*BlockMap)[4][4],
ogg_uint32_t YSuperBlocks,
ogg_uint32_t UVSuperBlocks,
ogg_uint32_t HFrags, ogg_uint32_t VFrags );
extern void UpRegulateDataStream (CP_INSTANCE *cpi, ogg_uint32_t RegulationQ,
ogg_int32_t RecoveryBlocks ) ;
extern void RegulateQ( CP_INSTANCE *cpi, ogg_int32_t UpdateScore );
extern void CopyBackExtraFrags(CP_INSTANCE *cpi);
extern void UpdateUMVBorder( PB_INSTANCE *pbi,
unsigned char * DestReconPtr );
extern void PInitFrameInfo(PP_INSTANCE * ppi);
extern double GetEstimatedBpb( CP_INSTANCE *cpi, ogg_uint32_t TargetQ );
extern void ClearTmpBuffers(PB_INSTANCE * pbi);
extern void InitTmpBuffers(PB_INSTANCE * pbi);
extern void ScanYUVInit( PP_INSTANCE * ppi,
SCAN_CONFIG_DATA * ScanConfigPtr);
#endif /* ENCODER_INTERNAL_H */

View file

@ -1,268 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dct.c 13884 2007-09-22 08:38:10Z giles $
********************************************************************/
#include "codec_internal.h"
#include "dsp.h"
#include "../cpu.h"
static ogg_int32_t xC1S7 = 64277;
static ogg_int32_t xC2S6 = 60547;
static ogg_int32_t xC3S5 = 54491;
static ogg_int32_t xC4S4 = 46341;
static ogg_int32_t xC5S3 = 36410;
static ogg_int32_t xC6S2 = 25080;
static ogg_int32_t xC7S1 = 12785;
#define SIGNBITDUPPED(X) ((signed )(((X) & 0x80000000)) >> 31)
#define DOROUND(X) ( (SIGNBITDUPPED(X) & (0xffff)) + (X) )
static void fdct_short__c ( ogg_int16_t * InputData, ogg_int16_t * OutputData ){
int loop;
ogg_int32_t is07, is12, is34, is56;
ogg_int32_t is0734, is1256;
ogg_int32_t id07, id12, id34, id56;
ogg_int32_t irot_input_x, irot_input_y;
ogg_int32_t icommon_product1; /* Re-used product (c4s4 * (s12 - s56)). */
ogg_int32_t icommon_product2; /* Re-used product (c4s4 * (d12 + d56)). */
ogg_int32_t temp1, temp2; /* intermediate variable for computation */
ogg_int32_t InterData[64];
ogg_int32_t *ip = InterData;
ogg_int16_t * op = OutputData;
for (loop = 0; loop < 8; loop++){
/* Pre calculate some common sums and differences. */
is07 = InputData[0] + InputData[7];
is12 = InputData[1] + InputData[2];
is34 = InputData[3] + InputData[4];
is56 = InputData[5] + InputData[6];
id07 = InputData[0] - InputData[7];
id12 = InputData[1] - InputData[2];
id34 = InputData[3] - InputData[4];
id56 = InputData[5] - InputData[6];
is0734 = is07 + is34;
is1256 = is12 + is56;
/* Pre-Calculate some common product terms. */
icommon_product1 = xC4S4*(is12 - is56);
icommon_product1 = DOROUND(icommon_product1);
icommon_product1>>=16;
icommon_product2 = xC4S4*(id12 + id56);
icommon_product2 = DOROUND(icommon_product2);
icommon_product2>>=16;
ip[0] = (xC4S4*(is0734 + is1256));
ip[0] = DOROUND(ip[0]);
ip[0] >>= 16;
ip[4] = (xC4S4*(is0734 - is1256));
ip[4] = DOROUND(ip[4]);
ip[4] >>= 16;
/* Define inputs to rotation for outputs 2 and 6 */
irot_input_x = id12 - id56;
irot_input_y = is07 - is34;
/* Apply rotation for outputs 2 and 6. */
temp1=xC6S2*irot_input_x;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC2S6*irot_input_y;
temp2=DOROUND(temp2);
temp2>>=16;
ip[2] = temp1 + temp2;
temp1=xC6S2*irot_input_y;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC2S6*irot_input_x ;
temp2=DOROUND(temp2);
temp2>>=16;
ip[6] = temp1 -temp2 ;
/* Define inputs to rotation for outputs 1 and 7 */
irot_input_x = icommon_product1 + id07;
irot_input_y = -( id34 + icommon_product2 );
/* Apply rotation for outputs 1 and 7. */
temp1=xC1S7*irot_input_x;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC7S1*irot_input_y;
temp2=DOROUND(temp2);
temp2>>=16;
ip[1] = temp1 - temp2;
temp1=xC7S1*irot_input_x;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC1S7*irot_input_y ;
temp2=DOROUND(temp2);
temp2>>=16;
ip[7] = temp1 + temp2 ;
/* Define inputs to rotation for outputs 3 and 5 */
irot_input_x = id07 - icommon_product1;
irot_input_y = id34 - icommon_product2;
/* Apply rotation for outputs 3 and 5. */
temp1=xC3S5*irot_input_x;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC5S3*irot_input_y ;
temp2=DOROUND(temp2);
temp2>>=16;
ip[3] = temp1 - temp2 ;
temp1=xC5S3*irot_input_x;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC3S5*irot_input_y;
temp2=DOROUND(temp2);
temp2>>=16;
ip[5] = temp1 + temp2;
/* Increment data pointer for next row. */
InputData += 8 ;
ip += 8; /* advance pointer to next row */
}
/* Performed DCT on rows, now transform the columns */
ip = InterData;
for (loop = 0; loop < 8; loop++){
/* Pre calculate some common sums and differences. */
is07 = ip[0 * 8] + ip[7 * 8];
is12 = ip[1 * 8] + ip[2 * 8];
is34 = ip[3 * 8] + ip[4 * 8];
is56 = ip[5 * 8] + ip[6 * 8];
id07 = ip[0 * 8] - ip[7 * 8];
id12 = ip[1 * 8] - ip[2 * 8];
id34 = ip[3 * 8] - ip[4 * 8];
id56 = ip[5 * 8] - ip[6 * 8];
is0734 = is07 + is34;
is1256 = is12 + is56;
/* Pre-Calculate some common product terms. */
icommon_product1 = xC4S4*(is12 - is56) ;
icommon_product2 = xC4S4*(id12 + id56) ;
icommon_product1 = DOROUND(icommon_product1);
icommon_product2 = DOROUND(icommon_product2);
icommon_product1>>=16;
icommon_product2>>=16;
temp1 = xC4S4*(is0734 + is1256) ;
temp2 = xC4S4*(is0734 - is1256) ;
temp1 = DOROUND(temp1);
temp2 = DOROUND(temp2);
temp1>>=16;
temp2>>=16;
op[0*8] = (ogg_int16_t) temp1;
op[4*8] = (ogg_int16_t) temp2;
/* Define inputs to rotation for outputs 2 and 6 */
irot_input_x = id12 - id56;
irot_input_y = is07 - is34;
/* Apply rotation for outputs 2 and 6. */
temp1=xC6S2*irot_input_x;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC2S6*irot_input_y;
temp2=DOROUND(temp2);
temp2>>=16;
op[2*8] = (ogg_int16_t) (temp1 + temp2);
temp1=xC6S2*irot_input_y;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC2S6*irot_input_x ;
temp2=DOROUND(temp2);
temp2>>=16;
op[6*8] = (ogg_int16_t) (temp1 -temp2) ;
/* Define inputs to rotation for outputs 1 and 7 */
irot_input_x = icommon_product1 + id07;
irot_input_y = -( id34 + icommon_product2 );
/* Apply rotation for outputs 1 and 7. */
temp1=xC1S7*irot_input_x;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC7S1*irot_input_y;
temp2=DOROUND(temp2);
temp2>>=16;
op[1*8] = (ogg_int16_t) (temp1 - temp2);
temp1=xC7S1*irot_input_x;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC1S7*irot_input_y ;
temp2=DOROUND(temp2);
temp2>>=16;
op[7*8] = (ogg_int16_t) (temp1 + temp2);
/* Define inputs to rotation for outputs 3 and 5 */
irot_input_x = id07 - icommon_product1;
irot_input_y = id34 - icommon_product2;
/* Apply rotation for outputs 3 and 5. */
temp1=xC3S5*irot_input_x;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC5S3*irot_input_y ;
temp2=DOROUND(temp2);
temp2>>=16;
op[3*8] = (ogg_int16_t) (temp1 - temp2) ;
temp1=xC5S3*irot_input_x;
temp1=DOROUND(temp1);
temp1>>=16;
temp2=xC3S5*irot_input_y;
temp2=DOROUND(temp2);
temp2>>=16;
op[5*8] = (ogg_int16_t) (temp1 + temp2);
/* Increment data pointer for next column. */
ip ++;
op ++;
}
}
void dsp_dct_init (DspFunctions *funcs, ogg_uint32_t cpu_flags)
{
funcs->fdct_short = fdct_short__c;
dsp_dct_decode_init(funcs, cpu_flags);
dsp_idct_init(funcs, cpu_flags);
#if defined(USE_ASM)
if (cpu_flags & OC_CPU_X86_MMX) {
dsp_mmx_fdct_init(funcs);
}
#endif
}

View file

@ -1,941 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dct_decode.c 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include "codec_internal.h"
#include "quant_lookup.h"
#define GOLDEN_FRAME_THRESH_Q 50
#define PUR 8
#define PU 4
#define PUL 2
#define PL 1
#define HIGHBITDUPPED(X) (((signed short) X) >> 15)
static const int ModeUsesMC[MAX_MODES] = { 0, 0, 1, 1, 1, 0, 1, 1 };
static void SetupBoundingValueArray_Generic(ogg_int16_t *BoundingValuePtr,
ogg_int32_t FLimit){
ogg_int32_t i;
/* Set up the bounding value array. */
memset ( BoundingValuePtr, 0, (256*sizeof(*BoundingValuePtr)) );
for ( i = 0; i < FLimit; i++ ){
BoundingValuePtr[127-i-FLimit] = (-FLimit+i);
BoundingValuePtr[127-i] = -i;
BoundingValuePtr[127+i] = i;
BoundingValuePtr[127+i+FLimit] = FLimit-i;
}
}
static void ExpandKFBlock ( PB_INSTANCE *pbi, ogg_int32_t FragmentNumber ){
ogg_uint32_t ReconPixelsPerLine;
ogg_int32_t ReconPixelIndex;
/* Select the appropriate inverse Q matrix and line stride */
if ( FragmentNumber<(ogg_int32_t)pbi->YPlaneFragments ){
ReconPixelsPerLine = pbi->YStride;
pbi->dequant_coeffs = pbi->dequant_Y_coeffs;
}else if ( FragmentNumber<(ogg_int32_t)(pbi->YPlaneFragments + pbi->UVPlaneFragments) ){
ReconPixelsPerLine = pbi->UVStride;
pbi->dequant_coeffs = pbi->dequant_U_coeffs;
}else{
ReconPixelsPerLine = pbi->UVStride;
pbi->dequant_coeffs = pbi->dequant_V_coeffs;
}
/* Set up pointer into the quantisation buffer. */
pbi->quantized_list = &pbi->QFragData[FragmentNumber][0];
/* Invert quantisation and DCT to get pixel data. */
switch(pbi->FragCoefEOB[FragmentNumber]){
case 0:case 1:
IDct1( pbi->quantized_list, pbi->dequant_coeffs, pbi->ReconDataBuffer );
break;
case 2: case 3:
dsp_IDct3(pbi->dsp, pbi->quantized_list, pbi->dequant_coeffs, pbi->ReconDataBuffer );
break;
case 4:case 5:case 6:case 7:case 8: case 9:case 10:
dsp_IDct10(pbi->dsp, pbi->quantized_list, pbi->dequant_coeffs, pbi->ReconDataBuffer );
break;
default:
dsp_IDctSlow(pbi->dsp, pbi->quantized_list, pbi->dequant_coeffs, pbi->ReconDataBuffer );
}
/* Convert fragment number to a pixel offset in a reconstruction buffer. */
ReconPixelIndex = pbi->recon_pixel_index_table[FragmentNumber];
/* Get the pixel index for the first pixel in the fragment. */
dsp_recon_intra8x8 (pbi->dsp, (unsigned char *)(&pbi->ThisFrameRecon[ReconPixelIndex]),
(ogg_int16_t *)pbi->ReconDataBuffer, ReconPixelsPerLine);
}
static void ExpandBlock ( PB_INSTANCE *pbi, ogg_int32_t FragmentNumber){
unsigned char *LastFrameRecPtr; /* Pointer into previous frame
reconstruction. */
unsigned char *LastFrameRecPtr2; /* Pointer into previous frame
reconstruction for 1/2 pixel MC. */
ogg_uint32_t ReconPixelsPerLine; /* Pixels per line */
ogg_int32_t ReconPixelIndex; /* Offset for block into a
reconstruction buffer */
ogg_int32_t ReconPtr2Offset; /* Offset for second
reconstruction in half pixel
MC */
ogg_int32_t MVOffset; /* Baseline motion vector offset */
ogg_int32_t MvShift ; /* Shift to correct to 1/2 or 1/4 pixel */
ogg_int32_t MvModMask; /* Mask to determine whether 1/2
pixel is used */
/* Get coding mode for this block */
if ( pbi->FrameType == KEY_FRAME ){
pbi->CodingMode = CODE_INTRA;
}else{
/* Get Motion vector and mode for this block. */
pbi->CodingMode = pbi->FragCodingMethod[FragmentNumber];
}
/* Select the appropriate inverse Q matrix and line stride */
if ( FragmentNumber<(ogg_int32_t)pbi->YPlaneFragments ) {
ReconPixelsPerLine = pbi->YStride;
MvShift = 1;
MvModMask = 0x00000001;
/* Select appropriate dequantiser matrix. */
if ( pbi->CodingMode == CODE_INTRA )
pbi->dequant_coeffs = pbi->dequant_Y_coeffs;
else
pbi->dequant_coeffs = pbi->dequant_InterY_coeffs;
}else{
ReconPixelsPerLine = pbi->UVStride;
MvShift = 2;
MvModMask = 0x00000003;
/* Select appropriate dequantiser matrix. */
if ( pbi->CodingMode == CODE_INTRA )
if ( FragmentNumber <
(ogg_int32_t)(pbi->YPlaneFragments + pbi->UVPlaneFragments) )
pbi->dequant_coeffs = pbi->dequant_U_coeffs;
else
pbi->dequant_coeffs = pbi->dequant_V_coeffs;
else
if ( FragmentNumber <
(ogg_int32_t)(pbi->YPlaneFragments + pbi->UVPlaneFragments) )
pbi->dequant_coeffs = pbi->dequant_InterU_coeffs;
else
pbi->dequant_coeffs = pbi->dequant_InterV_coeffs;
}
/* Set up pointer into the quantisation buffer. */
pbi->quantized_list = &pbi->QFragData[FragmentNumber][0];
/* Invert quantisation and DCT to get pixel data. */
switch(pbi->FragCoefEOB[FragmentNumber]){
case 0:case 1:
IDct1( pbi->quantized_list, pbi->dequant_coeffs, pbi->ReconDataBuffer );
break;
case 2: case 3:
dsp_IDct3(pbi->dsp, pbi->quantized_list, pbi->dequant_coeffs, pbi->ReconDataBuffer );
break;
case 4:case 5:case 6:case 7:case 8: case 9:case 10:
dsp_IDct10(pbi->dsp, pbi->quantized_list, pbi->dequant_coeffs, pbi->ReconDataBuffer );
break;
default:
dsp_IDctSlow(pbi->dsp, pbi->quantized_list, pbi->dequant_coeffs, pbi->ReconDataBuffer );
}
/* Convert fragment number to a pixel offset in a reconstruction buffer. */
ReconPixelIndex = pbi->recon_pixel_index_table[FragmentNumber];
/* Action depends on decode mode. */
if ( pbi->CodingMode == CODE_INTER_NO_MV ){
/* Inter with no motion vector */
/* Reconstruct the pixel data using the last frame reconstruction
and change data when the motion vector is (0,0), the recon is
based on the lastframe without loop filtering---- for testing */
dsp_recon_inter8x8 (pbi->dsp, &pbi->ThisFrameRecon[ReconPixelIndex],
&pbi->LastFrameRecon[ReconPixelIndex],
pbi->ReconDataBuffer, ReconPixelsPerLine);
}else if ( ModeUsesMC[pbi->CodingMode] ) {
/* The mode uses a motion vector. */
/* Get vector from list */
pbi->MVector.x = pbi->FragMVect[FragmentNumber].x;
pbi->MVector.y = pbi->FragMVect[FragmentNumber].y;
/* Work out the base motion vector offset and the 1/2 pixel offset
if any. For the U and V planes the MV specifies 1/4 pixel
accuracy. This is adjusted to 1/2 pixel as follows ( 0->0,
1/4->1/2, 1/2->1/2, 3/4->1/2 ). */
MVOffset = 0;
ReconPtr2Offset = 0;
if ( pbi->MVector.x > 0 ){
MVOffset = pbi->MVector.x >> MvShift;
if ( pbi->MVector.x & MvModMask )
ReconPtr2Offset += 1;
} else if ( pbi->MVector.x < 0 ) {
MVOffset -= (-pbi->MVector.x) >> MvShift;
if ( (-pbi->MVector.x) & MvModMask )
ReconPtr2Offset -= 1;
}
if ( pbi->MVector.y > 0 ){
MVOffset += (pbi->MVector.y >> MvShift) * ReconPixelsPerLine;
if ( pbi->MVector.y & MvModMask )
ReconPtr2Offset += ReconPixelsPerLine;
} else if ( pbi->MVector.y < 0 ){
MVOffset -= ((-pbi->MVector.y) >> MvShift) * ReconPixelsPerLine;
if ( (-pbi->MVector.y) & MvModMask )
ReconPtr2Offset -= ReconPixelsPerLine;
}
/* Set up the first of the two reconstruction buffer pointers. */
if ( pbi->CodingMode==CODE_GOLDEN_MV ) {
LastFrameRecPtr = &pbi->GoldenFrame[ReconPixelIndex] + MVOffset;
}else{
LastFrameRecPtr = &pbi->LastFrameRecon[ReconPixelIndex] + MVOffset;
}
/* Set up the second of the two reconstruction pointers. */
LastFrameRecPtr2 = LastFrameRecPtr + ReconPtr2Offset;
/* Select the appropriate reconstruction function */
if ( (int)(LastFrameRecPtr - LastFrameRecPtr2) == 0 ) {
/* Reconstruct the pixel dats from the reference frame and change data
(no half pixel in this case as the two references were the same. */
dsp_recon_inter8x8 (pbi->dsp,
&pbi->ThisFrameRecon[ReconPixelIndex],
LastFrameRecPtr, pbi->ReconDataBuffer,
ReconPixelsPerLine);
}else{
/* Fractional pixel reconstruction. */
/* Note that we only use two pixels per reconstruction even for
the diagonal. */
dsp_recon_inter8x8_half(pbi->dsp, &pbi->ThisFrameRecon[ReconPixelIndex],
LastFrameRecPtr, LastFrameRecPtr2,
pbi->ReconDataBuffer, ReconPixelsPerLine);
}
} else if ( pbi->CodingMode == CODE_USING_GOLDEN ){
/* Golden frame with motion vector */
/* Reconstruct the pixel data using the golden frame
reconstruction and change data */
dsp_recon_inter8x8 (pbi->dsp, &pbi->ThisFrameRecon[ReconPixelIndex],
&pbi->GoldenFrame[ ReconPixelIndex ],
pbi->ReconDataBuffer, ReconPixelsPerLine);
} else {
/* Simple Intra coding */
/* Get the pixel index for the first pixel in the fragment. */
dsp_recon_intra8x8 (pbi->dsp, &pbi->ThisFrameRecon[ReconPixelIndex],
pbi->ReconDataBuffer, ReconPixelsPerLine);
}
}
static void UpdateUMV_HBorders( PB_INSTANCE *pbi,
unsigned char * DestReconPtr,
ogg_uint32_t PlaneFragOffset ) {
ogg_uint32_t i;
ogg_uint32_t PixelIndex;
ogg_uint32_t PlaneStride;
ogg_uint32_t BlockVStep;
ogg_uint32_t PlaneFragments;
ogg_uint32_t LineFragments;
ogg_uint32_t PlaneBorderWidth;
unsigned char *SrcPtr1;
unsigned char *SrcPtr2;
unsigned char *DestPtr1;
unsigned char *DestPtr2;
/* Work out various plane specific values */
if ( PlaneFragOffset == 0 ) {
/* Y Plane */
BlockVStep = (pbi->YStride *
(VFRAGPIXELS - 1));
PlaneStride = pbi->YStride;
PlaneBorderWidth = UMV_BORDER;
PlaneFragments = pbi->YPlaneFragments;
LineFragments = pbi->HFragments;
}else{
/* U or V plane. */
BlockVStep = (pbi->UVStride *
(VFRAGPIXELS - 1));
PlaneStride = pbi->UVStride;
PlaneBorderWidth = UMV_BORDER / 2;
PlaneFragments = pbi->UVPlaneFragments;
LineFragments = pbi->HFragments / 2;
}
/* Setup the source and destination pointers for the top and bottom
borders */
PixelIndex = pbi->recon_pixel_index_table[PlaneFragOffset];
SrcPtr1 = &DestReconPtr[ PixelIndex - PlaneBorderWidth ];
DestPtr1 = SrcPtr1 - (PlaneBorderWidth * PlaneStride);
PixelIndex = pbi->recon_pixel_index_table[PlaneFragOffset +
PlaneFragments - LineFragments] +
BlockVStep;
SrcPtr2 = &DestReconPtr[ PixelIndex - PlaneBorderWidth];
DestPtr2 = SrcPtr2 + PlaneStride;
/* Now copy the top and bottom source lines into each line of the
respective borders */
for ( i = 0; i < PlaneBorderWidth; i++ ) {
memcpy( DestPtr1, SrcPtr1, PlaneStride );
memcpy( DestPtr2, SrcPtr2, PlaneStride );
DestPtr1 += PlaneStride;
DestPtr2 += PlaneStride;
}
}
static void UpdateUMV_VBorders( PB_INSTANCE *pbi,
unsigned char * DestReconPtr,
ogg_uint32_t PlaneFragOffset ){
ogg_uint32_t i;
ogg_uint32_t PixelIndex;
ogg_uint32_t PlaneStride;
ogg_uint32_t LineFragments;
ogg_uint32_t PlaneBorderWidth;
ogg_uint32_t PlaneHeight;
unsigned char *SrcPtr1;
unsigned char *SrcPtr2;
unsigned char *DestPtr1;
unsigned char *DestPtr2;
/* Work out various plane specific values */
if ( PlaneFragOffset == 0 ) {
/* Y Plane */
PlaneStride = pbi->YStride;
PlaneBorderWidth = UMV_BORDER;
LineFragments = pbi->HFragments;
PlaneHeight = pbi->info.height;
}else{
/* U or V plane. */
PlaneStride = pbi->UVStride;
PlaneBorderWidth = UMV_BORDER / 2;
LineFragments = pbi->HFragments / 2;
PlaneHeight = pbi->info.height / 2;
}
/* Setup the source data values and destination pointers for the
left and right edge borders */
PixelIndex = pbi->recon_pixel_index_table[PlaneFragOffset];
SrcPtr1 = &DestReconPtr[ PixelIndex ];
DestPtr1 = &DestReconPtr[ PixelIndex - PlaneBorderWidth ];
PixelIndex = pbi->recon_pixel_index_table[PlaneFragOffset +
LineFragments - 1] +
(HFRAGPIXELS - 1);
SrcPtr2 = &DestReconPtr[ PixelIndex ];
DestPtr2 = &DestReconPtr[ PixelIndex + 1 ];
/* Now copy the top and bottom source lines into each line of the
respective borders */
for ( i = 0; i < PlaneHeight; i++ ) {
memset( DestPtr1, SrcPtr1[0], PlaneBorderWidth );
memset( DestPtr2, SrcPtr2[0], PlaneBorderWidth );
SrcPtr1 += PlaneStride;
SrcPtr2 += PlaneStride;
DestPtr1 += PlaneStride;
DestPtr2 += PlaneStride;
}
}
void UpdateUMVBorder( PB_INSTANCE *pbi,
unsigned char * DestReconPtr ) {
ogg_uint32_t PlaneFragOffset;
/* Y plane */
PlaneFragOffset = 0;
UpdateUMV_VBorders( pbi, DestReconPtr, PlaneFragOffset );
UpdateUMV_HBorders( pbi, DestReconPtr, PlaneFragOffset );
/* Then the U and V Planes */
PlaneFragOffset = pbi->YPlaneFragments;
UpdateUMV_VBorders( pbi, DestReconPtr, PlaneFragOffset );
UpdateUMV_HBorders( pbi, DestReconPtr, PlaneFragOffset );
PlaneFragOffset = pbi->YPlaneFragments + pbi->UVPlaneFragments;
UpdateUMV_VBorders( pbi, DestReconPtr, PlaneFragOffset );
UpdateUMV_HBorders( pbi, DestReconPtr, PlaneFragOffset );
}
static void CopyRecon( PB_INSTANCE *pbi, unsigned char * DestReconPtr,
unsigned char * SrcReconPtr ) {
ogg_uint32_t i;
ogg_uint32_t PlaneLineStep; /* Pixels per line */
ogg_uint32_t PixelIndex;
unsigned char *SrcPtr; /* Pointer to line of source image data */
unsigned char *DestPtr; /* Pointer to line of destination image data */
/* Copy over only updated blocks.*/
/* First Y plane */
PlaneLineStep = pbi->YStride;
for ( i = 0; i < pbi->YPlaneFragments; i++ ) {
if ( pbi->display_fragments[i] ) {
PixelIndex = pbi->recon_pixel_index_table[i];
SrcPtr = &SrcReconPtr[ PixelIndex ];
DestPtr = &DestReconPtr[ PixelIndex ];
dsp_copy8x8 (pbi->dsp, SrcPtr, DestPtr, PlaneLineStep);
}
}
/* Then U and V */
PlaneLineStep = pbi->UVStride;
for ( i = pbi->YPlaneFragments; i < pbi->UnitFragments; i++ ) {
if ( pbi->display_fragments[i] ) {
PixelIndex = pbi->recon_pixel_index_table[i];
SrcPtr = &SrcReconPtr[ PixelIndex ];
DestPtr = &DestReconPtr[ PixelIndex ];
dsp_copy8x8 (pbi->dsp, SrcPtr, DestPtr, PlaneLineStep);
}
}
}
static void CopyNotRecon( PB_INSTANCE *pbi, unsigned char * DestReconPtr,
unsigned char * SrcReconPtr ) {
ogg_uint32_t i;
ogg_uint32_t PlaneLineStep; /* Pixels per line */
ogg_uint32_t PixelIndex;
unsigned char *SrcPtr; /* Pointer to line of source image data */
unsigned char *DestPtr; /* Pointer to line of destination image data*/
/* Copy over only updated blocks. */
/* First Y plane */
PlaneLineStep = pbi->YStride;
for ( i = 0; i < pbi->YPlaneFragments; i++ ) {
if ( !pbi->display_fragments[i] ) {
PixelIndex = pbi->recon_pixel_index_table[i];
SrcPtr = &SrcReconPtr[ PixelIndex ];
DestPtr = &DestReconPtr[ PixelIndex ];
dsp_copy8x8 (pbi->dsp, SrcPtr, DestPtr, PlaneLineStep);
}
}
/* Then U and V */
PlaneLineStep = pbi->UVStride;
for ( i = pbi->YPlaneFragments; i < pbi->UnitFragments; i++ ) {
if ( !pbi->display_fragments[i] ) {
PixelIndex = pbi->recon_pixel_index_table[i];
SrcPtr = &SrcReconPtr[ PixelIndex ];
DestPtr = &DestReconPtr[ PixelIndex ];
dsp_copy8x8 (pbi->dsp, SrcPtr, DestPtr, PlaneLineStep);
}
}
}
void ExpandToken( Q_LIST_ENTRY * ExpandedBlock,
unsigned char * CoeffIndex, ogg_uint32_t Token,
ogg_int32_t ExtraBits ){
/* Is the token is a combination run and value token. */
if ( Token >= DCT_RUN_CATEGORY1 ){
/* Expand the token and additional bits to a zero run length and
data value. */
if ( Token < DCT_RUN_CATEGORY2 ) {
/* Decoding method depends on token */
if ( Token < DCT_RUN_CATEGORY1B ) {
/* Step on by the zero run length */
*CoeffIndex += (unsigned char)((Token - DCT_RUN_CATEGORY1) + 1);
/* The extra bit determines the sign. */
if ( ExtraBits & 0x01 )
ExpandedBlock[*CoeffIndex] = -1;
else
ExpandedBlock[*CoeffIndex] = 1;
} else if ( Token == DCT_RUN_CATEGORY1B ) {
/* Bits 0-1 determines the zero run length */
*CoeffIndex += (6 + (ExtraBits & 0x03));
/* Bit 2 determines the sign */
if ( ExtraBits & 0x04 )
ExpandedBlock[*CoeffIndex] = -1;
else
ExpandedBlock[*CoeffIndex] = 1;
}else{
/* Bits 0-2 determines the zero run length */
*CoeffIndex += (10 + (ExtraBits & 0x07));
/* Bit 3 determines the sign */
if ( ExtraBits & 0x08 )
ExpandedBlock[*CoeffIndex] = -1;
else
ExpandedBlock[*CoeffIndex] = 1;
}
}else{
/* If token == DCT_RUN_CATEGORY2 we have a single 0 followed by
a value */
if ( Token == DCT_RUN_CATEGORY2 ){
/* Step on by the zero run length */
*CoeffIndex += 1;
/* Bit 1 determines sign, bit 0 the value */
if ( ExtraBits & 0x02 )
ExpandedBlock[*CoeffIndex] = -(2 + (ExtraBits & 0x01));
else
ExpandedBlock[*CoeffIndex] = 2 + (ExtraBits & 0x01);
}else{
/* else we have 2->3 zeros followed by a value */
/* Bit 0 determines the zero run length */
*CoeffIndex += 2 + (ExtraBits & 0x01);
/* Bit 2 determines the sign, bit 1 the value */
if ( ExtraBits & 0x04 )
ExpandedBlock[*CoeffIndex] = -(2 + ((ExtraBits & 0x02) >> 1));
else
ExpandedBlock[*CoeffIndex] = 2 + ((ExtraBits & 0x02) >> 1);
}
}
/* Step on over value */
*CoeffIndex += 1;
} else if ( Token == DCT_SHORT_ZRL_TOKEN ) {
/* Token is a ZRL token so step on by the appropriate number of zeros */
*CoeffIndex += ExtraBits + 1;
} else if ( Token == DCT_ZRL_TOKEN ) {
/* Token is a ZRL token so step on by the appropriate number of zeros */
*CoeffIndex += ExtraBits + 1;
} else if ( Token < LOW_VAL_TOKENS ) {
/* Token is a small single value token. */
switch ( Token ) {
case ONE_TOKEN:
ExpandedBlock[*CoeffIndex] = 1;
break;
case MINUS_ONE_TOKEN:
ExpandedBlock[*CoeffIndex] = -1;
break;
case TWO_TOKEN:
ExpandedBlock[*CoeffIndex] = 2;
break;
case MINUS_TWO_TOKEN:
ExpandedBlock[*CoeffIndex] = -2;
break;
}
/* Step on the coefficient index. */
*CoeffIndex += 1;
}else{
/* Token is a larger single value token */
/* Expand the token and additional bits to a data value. */
if ( Token < DCT_VAL_CATEGORY3 ) {
/* Offset from LOW_VAL_TOKENS determines value */
Token = Token - LOW_VAL_TOKENS;
/* Extra bit determines sign */
if ( ExtraBits )
ExpandedBlock[*CoeffIndex] =
-((Q_LIST_ENTRY)(Token + DCT_VAL_CAT2_MIN));
else
ExpandedBlock[*CoeffIndex] =
(Q_LIST_ENTRY)(Token + DCT_VAL_CAT2_MIN);
} else if ( Token == DCT_VAL_CATEGORY3 ) {
/* Bit 1 determines sign, Bit 0 the value */
if ( ExtraBits & 0x02 )
ExpandedBlock[*CoeffIndex] = -(DCT_VAL_CAT3_MIN + (ExtraBits & 0x01));
else
ExpandedBlock[*CoeffIndex] = DCT_VAL_CAT3_MIN + (ExtraBits & 0x01);
} else if ( Token == DCT_VAL_CATEGORY4 ) {
/* Bit 2 determines sign, Bit 0-1 the value */
if ( ExtraBits & 0x04 )
ExpandedBlock[*CoeffIndex] = -(DCT_VAL_CAT4_MIN + (ExtraBits & 0x03));
else
ExpandedBlock[*CoeffIndex] = DCT_VAL_CAT4_MIN + (ExtraBits & 0x03);
} else if ( Token == DCT_VAL_CATEGORY5 ) {
/* Bit 3 determines sign, Bit 0-2 the value */
if ( ExtraBits & 0x08 )
ExpandedBlock[*CoeffIndex] = -(DCT_VAL_CAT5_MIN + (ExtraBits & 0x07));
else
ExpandedBlock[*CoeffIndex] = DCT_VAL_CAT5_MIN + (ExtraBits & 0x07);
} else if ( Token == DCT_VAL_CATEGORY6 ) {
/* Bit 4 determines sign, Bit 0-3 the value */
if ( ExtraBits & 0x10 )
ExpandedBlock[*CoeffIndex] = -(DCT_VAL_CAT6_MIN + (ExtraBits & 0x0F));
else
ExpandedBlock[*CoeffIndex] = DCT_VAL_CAT6_MIN + (ExtraBits & 0x0F);
} else if ( Token == DCT_VAL_CATEGORY7 ) {
/* Bit 5 determines sign, Bit 0-4 the value */
if ( ExtraBits & 0x20 )
ExpandedBlock[*CoeffIndex] = -(DCT_VAL_CAT7_MIN + (ExtraBits & 0x1F));
else
ExpandedBlock[*CoeffIndex] = DCT_VAL_CAT7_MIN + (ExtraBits & 0x1F);
} else if ( Token == DCT_VAL_CATEGORY8 ) {
/* Bit 9 determines sign, Bit 0-8 the value */
if ( ExtraBits & 0x200 )
ExpandedBlock[*CoeffIndex] = -(DCT_VAL_CAT8_MIN + (ExtraBits & 0x1FF));
else
ExpandedBlock[*CoeffIndex] = DCT_VAL_CAT8_MIN + (ExtraBits & 0x1FF);
}
/* Step on the coefficient index. */
*CoeffIndex += 1;
}
}
void ClearDownQFragData(PB_INSTANCE *pbi){
ogg_int32_t i;
Q_LIST_ENTRY * QFragPtr;
for ( i = 0; i < pbi->CodedBlockIndex; i++ ) {
/* Get the linear index for the current fragment. */
QFragPtr = pbi->QFragData[pbi->CodedBlockList[i]];
memset(QFragPtr, 0, 64*sizeof(Q_LIST_ENTRY));
}
}
static void loop_filter_h(unsigned char * PixelPtr,
ogg_int32_t LineLength,
ogg_int16_t *BoundingValuePtr){
ogg_int32_t j;
ogg_int32_t FiltVal;
PixelPtr-=2;
for ( j = 0; j < 8; j++ ){
FiltVal =
( PixelPtr[0] ) -
( PixelPtr[1] * 3 ) +
( PixelPtr[2] * 3 ) -
( PixelPtr[3] );
FiltVal = *(BoundingValuePtr+((FiltVal + 4) >> 3));
PixelPtr[1] = clamp255(PixelPtr[1] + FiltVal);
PixelPtr[2] = clamp255(PixelPtr[2] - FiltVal);
PixelPtr += LineLength;
}
}
static void loop_filter_v(unsigned char * PixelPtr,
ogg_int32_t LineLength,
ogg_int16_t *BoundingValuePtr){
ogg_int32_t j;
ogg_int32_t FiltVal;
PixelPtr -= 2*LineLength;
for ( j = 0; j < 8; j++ ) {
FiltVal = ( (ogg_int32_t)PixelPtr[0] ) -
( (ogg_int32_t)PixelPtr[LineLength] * 3 ) +
( (ogg_int32_t)PixelPtr[2 * LineLength] * 3 ) -
( (ogg_int32_t)PixelPtr[3 * LineLength] );
FiltVal = *(BoundingValuePtr+((FiltVal + 4) >> 3));
PixelPtr[LineLength] = clamp255(PixelPtr[LineLength] + FiltVal);
PixelPtr[2 * LineLength] = clamp255(PixelPtr[2*LineLength] - FiltVal);
PixelPtr ++;
}
}
static void LoopFilter__c(PB_INSTANCE *pbi, int FLimit){
int j;
ogg_int16_t BoundingValues[256];
ogg_int16_t *bvp = BoundingValues+127;
unsigned char *cp = pbi->display_fragments;
ogg_uint32_t *bp = pbi->recon_pixel_index_table;
if ( FLimit == 0 ) return;
SetupBoundingValueArray_Generic(BoundingValues, FLimit);
for ( j = 0; j < 3 ; j++){
ogg_uint32_t *bp_begin = bp;
ogg_uint32_t *bp_end;
int stride;
int h;
switch(j) {
case 0: /* y */
bp_end = bp + pbi->YPlaneFragments;
h = pbi->HFragments;
stride = pbi->YStride;
break;
default: /* u,v, 4:20 specific */
bp_end = bp + pbi->UVPlaneFragments;
h = pbi->HFragments >> 1;
stride = pbi->UVStride;
break;
}
while(bp<bp_end){
ogg_uint32_t *bp_left = bp;
ogg_uint32_t *bp_right = bp + h;
while(bp<bp_right){
if(cp[0]){
if(bp>bp_left)
loop_filter_h(&pbi->LastFrameRecon[bp[0]],stride,bvp);
if(bp_left>bp_begin)
loop_filter_v(&pbi->LastFrameRecon[bp[0]],stride,bvp);
if(bp+1<bp_right && !cp[1])
loop_filter_h(&pbi->LastFrameRecon[bp[0]]+8,stride,bvp);
if(bp+h<bp_end && !cp[h])
loop_filter_v(&pbi->LastFrameRecon[bp[h]],stride,bvp);
}
bp++;
cp++;
}
}
}
}
void ReconRefFrames (PB_INSTANCE *pbi){
ogg_int32_t i;
unsigned char *SwapReconBuffersTemp;
/* predictor multiplier up-left, up, up-right,left, shift
Entries are packed in the order L, UL, U, UR, with missing entries
moved to the end (before the shift parameters). */
static const ogg_int16_t pc[16][6]={
{0,0,0,0,0,0},
{1,0,0,0,0,0}, /* PL */
{1,0,0,0,0,0}, /* PUL */
{1,0,0,0,0,0}, /* PUL|PL */
{1,0,0,0,0,0}, /* PU */
{1,1,0,0,1,1}, /* PU|PL */
{0,1,0,0,0,0}, /* PU|PUL */
{29,-26,29,0,5,31}, /* PU|PUL|PL */
{1,0,0,0,0,0}, /* PUR */
{75,53,0,0,7,127}, /* PUR|PL */
{1,1,0,0,1,1}, /* PUR|PUL */
{75,0,53,0,7,127}, /* PUR|PUL|PL */
{1,0,0,0,0,0}, /* PUR|PU */
{75,0,53,0,7,127}, /* PUR|PU|PL */
{3,10,3,0,4,15}, /* PUR|PU|PUL */
{29,-26,29,0,5,31} /* PUR|PU|PUL|PL */
};
/* boundary case bit masks. */
static const int bc_mask[8]={
/* normal case no boundary condition */
PUR|PU|PUL|PL,
/* left column */
PUR|PU,
/* top row */
PL,
/* top row, left column */
0,
/* right column */
PU|PUL|PL,
/* right and left column */
PU,
/* top row, right column */
PL,
/* top row, right and left column */
0
};
/* value left value up-left, value up, value up-right, missing
values skipped. */
int v[4];
/* fragment number left, up-left, up, up-right */
int fn[4];
/* predictor count. */
int pcount;
short wpc;
static const short Mode2Frame[] = {
1, /* CODE_INTER_NO_MV 0 => Encoded diff from same MB last frame */
0, /* CODE_INTRA 1 => DCT Encoded Block */
1, /* CODE_INTER_PLUS_MV 2 => Encoded diff from included MV MB last frame */
1, /* CODE_INTER_LAST_MV 3 => Encoded diff from MRU MV MB last frame */
1, /* CODE_INTER_PRIOR_MV 4 => Encoded diff from included 4 separate MV blocks */
2, /* CODE_USING_GOLDEN 5 => Encoded diff from same MB golden frame */
2, /* CODE_GOLDEN_MV 6 => Encoded diff from included MV MB golden frame */
1 /* CODE_INTER_FOUR_MV 7 => Encoded diff from included 4 separate MV blocks */
};
short Last[3];
short PredictedDC;
int FragsAcross=pbi->HFragments;
int FromFragment,ToFragment;
int FragsDown = pbi->VFragments;
int WhichFrame;
int WhichCase;
int j,k,m,n;
void (*ExpandBlockA) ( PB_INSTANCE *pbi, ogg_int32_t FragmentNumber );
if ( pbi->FrameType == KEY_FRAME )
ExpandBlockA=ExpandKFBlock;
else
ExpandBlockA=ExpandBlock;
/* for y,u,v */
for ( j = 0; j < 3 ; j++) {
/* pick which fragments based on Y, U, V */
switch(j){
case 0: /* y */
FromFragment = 0;
ToFragment = pbi->YPlaneFragments;
FragsAcross = pbi->HFragments;
FragsDown = pbi->VFragments;
break;
case 1: /* u */
FromFragment = pbi->YPlaneFragments;
ToFragment = pbi->YPlaneFragments + pbi->UVPlaneFragments ;
FragsAcross = pbi->HFragments >> 1;
FragsDown = pbi->VFragments >> 1;
break;
/*case 2: v */
default:
FromFragment = pbi->YPlaneFragments + pbi->UVPlaneFragments;
ToFragment = pbi->YPlaneFragments + (2 * pbi->UVPlaneFragments) ;
FragsAcross = pbi->HFragments >> 1;
FragsDown = pbi->VFragments >> 1;
break;
}
/* initialize our array of last used DC Components */
for(k=0;k<3;k++)
Last[k]=0;
i=FromFragment;
/* do prediction on all of Y, U or V */
for ( m = 0 ; m < FragsDown ; m++) {
for ( n = 0 ; n < FragsAcross ; n++, i++){
/* only do 2 prediction if fragment coded and on non intra or
if all fragments are intra */
if( pbi->display_fragments[i] || (pbi->FrameType == KEY_FRAME) ){
/* Type of Fragment */
WhichFrame = Mode2Frame[pbi->FragCodingMethod[i]];
/* Check Borderline Cases */
WhichCase = (n==0) + ((m==0) << 1) + ((n+1 == FragsAcross) << 2);
fn[0]=i-1;
fn[1]=i-FragsAcross-1;
fn[2]=i-FragsAcross;
fn[3]=i-FragsAcross+1;
/* fragment valid for prediction use if coded and it comes
from same frame as the one we are predicting */
for(k=pcount=wpc=0; k<4; k++) {
int pflag;
pflag=1<<k;
if((bc_mask[WhichCase]&pflag) &&
pbi->display_fragments[fn[k]] &&
(Mode2Frame[pbi->FragCodingMethod[fn[k]]] == WhichFrame)){
v[pcount]=pbi->QFragData[fn[k]][0];
wpc|=pflag;
pcount++;
}
}
if(wpc==0){
/* fall back to the last coded fragment */
pbi->QFragData[i][0] += Last[WhichFrame];
}else{
/* don't do divide if divisor is 1 or 0 */
PredictedDC = pc[wpc][0]*v[0];
for(k=1; k<pcount; k++){
PredictedDC += pc[wpc][k]*v[k];
}
/* if we need to do a shift */
if(pc[wpc][4] != 0 ){
/* If negative add in the negative correction factor */
PredictedDC += (HIGHBITDUPPED(PredictedDC) & pc[wpc][5]);
/* Shift in lieu of a divide */
PredictedDC >>= pc[wpc][4];
}
/* check for outranging on the two predictors that can outrange */
if((wpc&(PU|PUL|PL)) == (PU|PUL|PL)){
if( abs(PredictedDC - v[2]) > 128) {
PredictedDC = v[2];
} else if( abs(PredictedDC - v[0]) > 128) {
PredictedDC = v[0];
} else if( abs(PredictedDC - v[1]) > 128) {
PredictedDC = v[1];
}
}
pbi->QFragData[i][0] += PredictedDC;
}
/* Save the last fragment coded for whatever frame we are
predicting from */
Last[WhichFrame] = pbi->QFragData[i][0];
/* Inverse DCT and reconstitute buffer in thisframe */
ExpandBlockA( pbi, i );
}
}
}
}
/* Copy the current reconstruction back to the last frame recon buffer. */
if(pbi->CodedBlockIndex > (ogg_int32_t) (pbi->UnitFragments >> 1)){
SwapReconBuffersTemp = pbi->ThisFrameRecon;
pbi->ThisFrameRecon = pbi->LastFrameRecon;
pbi->LastFrameRecon = SwapReconBuffersTemp;
CopyNotRecon( pbi, pbi->LastFrameRecon, pbi->ThisFrameRecon );
}else{
CopyRecon( pbi, pbi->LastFrameRecon, pbi->ThisFrameRecon );
}
/* Apply a loop filter to edge pixels of updated blocks */
dsp_LoopFilter(pbi->dsp, pbi, pbi->quant_info.loop_filter_limits[pbi->FrameQIndex]);
/* We may need to update the UMV border */
UpdateUMVBorder(pbi, pbi->LastFrameRecon);
/* Reconstruct the golden frame if necessary.
For VFW codec only on key frames */
if ( pbi->FrameType == KEY_FRAME ){
CopyRecon( pbi, pbi->GoldenFrame, pbi->LastFrameRecon );
/* We may need to update the UMV border */
UpdateUMVBorder(pbi, pbi->GoldenFrame);
}
}
void dsp_dct_decode_init (DspFunctions *funcs, ogg_uint32_t cpu_flags)
{
funcs->LoopFilter = LoopFilter__c;
#if defined(USE_ASM)
// Todo: Port the dct for MSC one day.
#if !defined (_MSC_VER)
if (cpu_flags & OC_CPU_X86_MMX) {
dsp_mmx_dct_decode_init(funcs);
}
#endif
#endif
}

View file

@ -1,469 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dct_encode.c 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
#include <stdlib.h>
#include "codec_internal.h"
#include "dsp.h"
#include "quant_lookup.h"
static int ModeUsesMC[MAX_MODES] = { 0, 0, 1, 1, 1, 0, 1, 1 };
static unsigned char TokenizeDctValue (ogg_int16_t DataValue,
ogg_uint32_t * TokenListPtr ){
unsigned char tokens_added = 0;
ogg_uint32_t AbsDataVal = abs( (ogg_int32_t)DataValue );
/* Values are tokenised as category value and a number of additional
bits that define the position within the category. */
if ( DataValue == 0 ) return 0;
if ( AbsDataVal == 1 ){
if ( DataValue == 1 )
TokenListPtr[0] = ONE_TOKEN;
else
TokenListPtr[0] = MINUS_ONE_TOKEN;
tokens_added = 1;
} else if ( AbsDataVal == 2 ) {
if ( DataValue == 2 )
TokenListPtr[0] = TWO_TOKEN;
else
TokenListPtr[0] = MINUS_TWO_TOKEN;
tokens_added = 1;
} else if ( AbsDataVal <= MAX_SINGLE_TOKEN_VALUE ) {
TokenListPtr[0] = LOW_VAL_TOKENS + (AbsDataVal - DCT_VAL_CAT2_MIN);
if ( DataValue > 0 )
TokenListPtr[1] = 0;
else
TokenListPtr[1] = 1;
tokens_added = 2;
} else if ( AbsDataVal <= 8 ) {
/* Bit 1 determines sign, Bit 0 the value */
TokenListPtr[0] = DCT_VAL_CATEGORY3;
if ( DataValue > 0 )
TokenListPtr[1] = (AbsDataVal - DCT_VAL_CAT3_MIN);
else
TokenListPtr[1] = (0x02) + (AbsDataVal - DCT_VAL_CAT3_MIN);
tokens_added = 2;
} else if ( AbsDataVal <= 12 ) {
/* Bit 2 determines sign, Bit 0-2 the value */
TokenListPtr[0] = DCT_VAL_CATEGORY4;
if ( DataValue > 0 )
TokenListPtr[1] = (AbsDataVal - DCT_VAL_CAT4_MIN);
else
TokenListPtr[1] = (0x04) + (AbsDataVal - DCT_VAL_CAT4_MIN);
tokens_added = 2;
} else if ( AbsDataVal <= 20 ) {
/* Bit 3 determines sign, Bit 0-2 the value */
TokenListPtr[0] = DCT_VAL_CATEGORY5;
if ( DataValue > 0 )
TokenListPtr[1] = (AbsDataVal - DCT_VAL_CAT5_MIN);
else
TokenListPtr[1] = (0x08) + (AbsDataVal - DCT_VAL_CAT5_MIN);
tokens_added = 2;
} else if ( AbsDataVal <= 36 ) {
/* Bit 4 determines sign, Bit 0-3 the value */
TokenListPtr[0] = DCT_VAL_CATEGORY6;
if ( DataValue > 0 )
TokenListPtr[1] = (AbsDataVal - DCT_VAL_CAT6_MIN);
else
TokenListPtr[1] = (0x010) + (AbsDataVal - DCT_VAL_CAT6_MIN);
tokens_added = 2;
} else if ( AbsDataVal <= 68 ) {
/* Bit 5 determines sign, Bit 0-4 the value */
TokenListPtr[0] = DCT_VAL_CATEGORY7;
if ( DataValue > 0 )
TokenListPtr[1] = (AbsDataVal - DCT_VAL_CAT7_MIN);
else
TokenListPtr[1] = (0x20) + (AbsDataVal - DCT_VAL_CAT7_MIN);
tokens_added = 2;
} else if ( AbsDataVal <= 511 ) {
/* Bit 9 determines sign, Bit 0-8 the value */
TokenListPtr[0] = DCT_VAL_CATEGORY8;
if ( DataValue > 0 )
TokenListPtr[1] = (AbsDataVal - DCT_VAL_CAT8_MIN);
else
TokenListPtr[1] = (0x200) + (AbsDataVal - DCT_VAL_CAT8_MIN);
tokens_added = 2;
} else {
TokenListPtr[0] = DCT_VAL_CATEGORY8;
if ( DataValue > 0 )
TokenListPtr[1] = (511 - DCT_VAL_CAT8_MIN);
else
TokenListPtr[1] = (0x200) + (511 - DCT_VAL_CAT8_MIN);
tokens_added = 2;
}
/* Return the total number of tokens added */
return tokens_added;
}
static unsigned char TokenizeDctRunValue (unsigned char RunLength,
ogg_int16_t DataValue,
ogg_uint32_t * TokenListPtr ){
unsigned char tokens_added = 0;
ogg_uint32_t AbsDataVal = abs( (ogg_int32_t)DataValue );
/* Values are tokenised as category value and a number of additional
bits that define the category. */
if ( DataValue == 0 ) return 0;
if ( AbsDataVal == 1 ) {
/* Zero runs of 1-5 */
if ( RunLength <= 5 ) {
TokenListPtr[0] = DCT_RUN_CATEGORY1 + (RunLength - 1);
if ( DataValue > 0 )
TokenListPtr[1] = 0;
else
TokenListPtr[1] = 1;
} else if ( RunLength <= 9 ) {
/* Zero runs of 6-9 */
TokenListPtr[0] = DCT_RUN_CATEGORY1B;
if ( DataValue > 0 )
TokenListPtr[1] = (RunLength - 6);
else
TokenListPtr[1] = 0x04 + (RunLength - 6);
} else {
/* Zero runs of 10-17 */
TokenListPtr[0] = DCT_RUN_CATEGORY1C;
if ( DataValue > 0 )
TokenListPtr[1] = (RunLength - 10);
else
TokenListPtr[1] = 0x08 + (RunLength - 10);
}
tokens_added = 2;
} else if ( AbsDataVal <= 3 ) {
if ( RunLength == 1 ) {
TokenListPtr[0] = DCT_RUN_CATEGORY2;
/* Extra bits token bit 1 indicates sign, bit 0 indicates value */
if ( DataValue > 0 )
TokenListPtr[1] = (AbsDataVal - 2);
else
TokenListPtr[1] = (0x02) + (AbsDataVal - 2);
tokens_added = 2;
}else{
TokenListPtr[0] = DCT_RUN_CATEGORY2 + 1;
/* Extra bits token. */
/* bit 2 indicates sign, bit 1 indicates value, bit 0 indicates
run length */
if ( DataValue > 0 )
TokenListPtr[1] = ((AbsDataVal - 2) << 1) + (RunLength - 2);
else
TokenListPtr[1] = (0x04) + ((AbsDataVal - 2) << 1) + (RunLength - 2);
tokens_added = 2;
}
} else {
tokens_added = 2; /* ERROR */
/*IssueWarning( "Bad Input to TokenizeDctRunValue" );*/
}
/* Return the total number of tokens added */
return tokens_added;
}
static unsigned char TokenizeDctBlock (ogg_int16_t * RawData,
ogg_uint32_t * TokenListPtr ) {
ogg_uint32_t i;
unsigned char run_count;
unsigned char token_count = 0; /* Number of tokens crated. */
ogg_uint32_t AbsData;
/* Tokenize the block */
for( i = 0; i < BLOCK_SIZE; i++ ){
run_count = 0;
/* Look for a zero run. */
/* NOTE the use of & instead of && which is faster (and
equivalent) in this instance. */
/* NO, NO IT ISN'T --Monty */
while( (i < BLOCK_SIZE) && (!RawData[i]) ){
run_count++;
i++;
}
/* If we have reached the end of the block then code EOB */
if ( i == BLOCK_SIZE ){
TokenListPtr[token_count] = DCT_EOB_TOKEN;
token_count++;
}else{
/* If we have a short zero run followed by a low data value code
the two as a composite token. */
if ( run_count ){
AbsData = abs(RawData[i]);
if ( ((AbsData == 1) && (run_count <= 17)) ||
((AbsData <= 3) && (run_count <= 3)) ) {
/* Tokenise the run and subsequent value combination value */
token_count += TokenizeDctRunValue( run_count,
RawData[i],
&TokenListPtr[token_count] );
}else{
/* Else if we have a long non-EOB run or a run followed by a
value token > MAX_RUN_VAL then code the run and token
seperately */
if ( run_count <= 8 )
TokenListPtr[token_count] = DCT_SHORT_ZRL_TOKEN;
else
TokenListPtr[token_count] = DCT_ZRL_TOKEN;
token_count++;
TokenListPtr[token_count] = run_count - 1;
token_count++;
/* Now tokenize the value */
token_count += TokenizeDctValue( RawData[i],
&TokenListPtr[token_count] );
}
}else{
/* Else there was NO zero run. */
/* Tokenise the value */
token_count += TokenizeDctValue( RawData[i],
&TokenListPtr[token_count] );
}
}
}
/* Return the total number of tokens (including additional bits
tokens) used. */
return token_count;
}
ogg_uint32_t DPCMTokenizeBlock (CP_INSTANCE *cpi,
ogg_int32_t FragIndex){
ogg_uint32_t token_count;
if ( cpi->pb.FrameType == KEY_FRAME ){
/* Key frame so code block in INTRA mode. */
cpi->pb.CodingMode = CODE_INTRA;
}else{
/* Get Motion vector and mode for this block. */
cpi->pb.CodingMode = cpi->pb.FragCodingMethod[FragIndex];
}
/* Tokenise the dct data. */
token_count = TokenizeDctBlock( cpi->pb.QFragData[FragIndex],
cpi->pb.TokenList[FragIndex] );
cpi->FragTokenCounts[FragIndex] = token_count;
cpi->TotTokenCount += token_count;
/* Return number of pixels coded (i.e. 8x8). */
return BLOCK_SIZE;
}
static int AllZeroDctData( Q_LIST_ENTRY * QuantList ){
ogg_uint32_t i;
for ( i = 0; i < 64; i ++ )
if ( QuantList[i] != 0 )
return 0;
return 1;
}
static void MotionBlockDifference (CP_INSTANCE * cpi, unsigned char * FiltPtr,
ogg_int16_t *DctInputPtr, ogg_int32_t MvDevisor,
unsigned char* old_ptr1, unsigned char* new_ptr1,
ogg_uint32_t FragIndex,ogg_uint32_t PixelsPerLine,
ogg_uint32_t ReconPixelsPerLine) {
ogg_int32_t MvShift;
ogg_int32_t MvModMask;
ogg_int32_t AbsRefOffset;
ogg_int32_t AbsXOffset;
ogg_int32_t AbsYOffset;
ogg_int32_t MVOffset; /* Baseline motion vector offset */
ogg_int32_t ReconPtr2Offset; /* Offset for second reconstruction in
half pixel MC */
unsigned char *ReconPtr1; /* DCT reconstructed image pointers */
unsigned char *ReconPtr2; /* Pointer used in half pixel MC */
switch(MvDevisor) {
case 2:
MvShift = 1;
MvModMask = 1;
break;
case 4:
MvShift = 2;
MvModMask = 3;
break;
default:
break;
}
cpi->MVector.x = cpi->pb.FragMVect[FragIndex].x;
cpi->MVector.y = cpi->pb.FragMVect[FragIndex].y;
/* Set up the baseline offset for the motion vector. */
MVOffset = ((cpi->MVector.y / MvDevisor) * ReconPixelsPerLine) +
(cpi->MVector.x / MvDevisor);
/* Work out the offset of the second reference position for 1/2
pixel interpolation. For the U and V planes the MV specifies 1/4
pixel accuracy. This is adjusted to 1/2 pixel as follows ( 0->0,
1/4->1/2, 1/2->1/2, 3/4->1/2 ). */
ReconPtr2Offset = 0;
AbsXOffset = cpi->MVector.x % MvDevisor;
AbsYOffset = cpi->MVector.y % MvDevisor;
if ( AbsXOffset ) {
if ( cpi->MVector.x > 0 )
ReconPtr2Offset += 1;
else
ReconPtr2Offset -= 1;
}
if ( AbsYOffset ) {
if ( cpi->MVector.y > 0 )
ReconPtr2Offset += ReconPixelsPerLine;
else
ReconPtr2Offset -= ReconPixelsPerLine;
}
if ( cpi->pb.CodingMode==CODE_GOLDEN_MV ) {
ReconPtr1 = &cpi->
pb.GoldenFrame[cpi->pb.recon_pixel_index_table[FragIndex]];
} else {
ReconPtr1 = &cpi->
pb.LastFrameRecon[cpi->pb.recon_pixel_index_table[FragIndex]];
}
ReconPtr1 += MVOffset;
ReconPtr2 = ReconPtr1 + ReconPtr2Offset;
AbsRefOffset = abs((int)(ReconPtr1 - ReconPtr2));
/* Is the MV offset exactly pixel alligned */
if ( AbsRefOffset == 0 ){
dsp_sub8x8(cpi->dsp, FiltPtr, ReconPtr1, DctInputPtr,
PixelsPerLine, ReconPixelsPerLine);
dsp_copy8x8 (cpi->dsp, new_ptr1, old_ptr1, PixelsPerLine);
} else {
/* Fractional pixel MVs. */
/* Note that we only use two pixel values even for the diagonal */
dsp_sub8x8avg2(cpi->dsp, FiltPtr, ReconPtr1,ReconPtr2,DctInputPtr,
PixelsPerLine, ReconPixelsPerLine);
dsp_copy8x8 (cpi->dsp, new_ptr1, old_ptr1, PixelsPerLine);
}
}
void TransformQuantizeBlock (CP_INSTANCE *cpi, ogg_int32_t FragIndex,
ogg_uint32_t PixelsPerLine) {
unsigned char *new_ptr1; /* Pointers into current frame */
unsigned char *old_ptr1; /* Pointers into old frame */
unsigned char *FiltPtr; /* Pointers to srf filtered pixels */
ogg_int16_t *DctInputPtr; /* Pointer into buffer containing input to DCT */
int LeftEdge; /* Flag if block at left edge of component */
ogg_uint32_t ReconPixelsPerLine; /* Line length for recon buffers. */
unsigned char *ReconPtr1; /* DCT reconstructed image pointers */
ogg_int32_t MvDevisor; /* Defines MV resolution (2 = 1/2
pixel for Y or 4 = 1/4 for UV) */
new_ptr1 = &cpi->yuv1ptr[cpi->pb.pixel_index_table[FragIndex]];
old_ptr1 = &cpi->yuv0ptr[cpi->pb.pixel_index_table[FragIndex]];
DctInputPtr = cpi->DCTDataBuffer;
/* Set plane specific values */
if (FragIndex < (ogg_int32_t)cpi->pb.YPlaneFragments){
ReconPixelsPerLine = cpi->pb.YStride;
MvDevisor = 2; /* 1/2 pixel accuracy in Y */
}else{
ReconPixelsPerLine = cpi->pb.UVStride;
MvDevisor = 4; /* UV planes at 1/2 resolution of Y */
}
/* adjusted / filtered pointers */
FiltPtr = &cpi->ConvDestBuffer[cpi->pb.pixel_index_table[FragIndex]];
if ( cpi->pb.FrameType == KEY_FRAME ) {
/* Key frame so code block in INTRA mode. */
cpi->pb.CodingMode = CODE_INTRA;
}else{
/* Get Motion vector and mode for this block. */
cpi->pb.CodingMode = cpi->pb.FragCodingMethod[FragIndex];
}
/* Selection of Quantiser matrix and set other plane related values. */
if ( FragIndex < (ogg_int32_t)cpi->pb.YPlaneFragments ){
LeftEdge = !(FragIndex%cpi->pb.HFragments);
/* Select the appropriate Y quantiser matrix */
if ( cpi->pb.CodingMode == CODE_INTRA )
select_quantiser(&cpi->pb, BLOCK_Y);
else
select_quantiser(&cpi->pb, BLOCK_INTER_Y);
} else {
LeftEdge = !((FragIndex-cpi->pb.YPlaneFragments)%(cpi->pb.HFragments>>1));
if(FragIndex < (ogg_int32_t)cpi->pb.YPlaneFragments + (ogg_int32_t)cpi->pb.UVPlaneFragments) {
/* U plane */
if ( cpi->pb.CodingMode == CODE_INTRA )
select_quantiser(&cpi->pb, BLOCK_U);
else
select_quantiser(&cpi->pb, BLOCK_INTER_U);
} else {
/* V plane */
if ( cpi->pb.CodingMode == CODE_INTRA )
select_quantiser(&cpi->pb, BLOCK_V);
else
select_quantiser(&cpi->pb, BLOCK_INTER_V);
}
}
if ( ModeUsesMC[cpi->pb.CodingMode] ){
MotionBlockDifference(cpi, FiltPtr, DctInputPtr, MvDevisor,
old_ptr1, new_ptr1, FragIndex, PixelsPerLine,
ReconPixelsPerLine);
} else if ( (cpi->pb.CodingMode==CODE_INTER_NO_MV ) ||
( cpi->pb.CodingMode==CODE_USING_GOLDEN ) ) {
if ( cpi->pb.CodingMode==CODE_INTER_NO_MV ) {
ReconPtr1 = &cpi->
pb.LastFrameRecon[cpi->pb.recon_pixel_index_table[FragIndex]];
} else {
ReconPtr1 = &cpi->
pb.GoldenFrame[cpi->pb.recon_pixel_index_table[FragIndex]];
}
dsp_sub8x8(cpi->dsp, FiltPtr, ReconPtr1, DctInputPtr,
PixelsPerLine, ReconPixelsPerLine);
dsp_copy8x8 (cpi->dsp, new_ptr1, old_ptr1, PixelsPerLine);
} else if ( cpi->pb.CodingMode==CODE_INTRA ) {
dsp_sub8x8_128(cpi->dsp, FiltPtr, DctInputPtr, PixelsPerLine);
dsp_copy8x8 (cpi->dsp, new_ptr1, old_ptr1, PixelsPerLine);
}
/* Proceed to encode the data into the encode buffer if the encoder
is enabled. */
/* Perform a 2D DCT transform on the data. */
dsp_fdct_short(cpi->dsp, cpi->DCTDataBuffer, cpi->DCT_codes );
/* Quantize that transform data. */
quantize ( &cpi->pb, cpi->DCT_codes, cpi->pb.QFragData[FragIndex] );
if ( (cpi->pb.CodingMode == CODE_INTER_NO_MV) &&
( AllZeroDctData(cpi->pb.QFragData[FragIndex]) ) ) {
cpi->pb.display_fragments[FragIndex] = 0;
}
}

View file

@ -1,422 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dsp.c 15427 2008-10-21 02:36:19Z xiphmont $
********************************************************************/
#include <stdlib.h>
#include "codec_internal.h"
#include "../cpu.c"
#define DSP_OP_AVG(a,b) ((((int)(a)) + ((int)(b)))/2)
#define DSP_OP_DIFF(a,b) (((int)(a)) - ((int)(b)))
#define DSP_OP_ABS_DIFF(a,b) abs((((int)(a)) - ((int)(b))))
static void sub8x8__c (unsigned char *FiltPtr, unsigned char *ReconPtr,
ogg_int16_t *DctInputPtr, ogg_uint32_t PixelsPerLine,
ogg_uint32_t ReconPixelsPerLine) {
int i;
/* For each block row */
for (i=8; i; i--) {
DctInputPtr[0] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[0], ReconPtr[0]);
DctInputPtr[1] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[1], ReconPtr[1]);
DctInputPtr[2] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[2], ReconPtr[2]);
DctInputPtr[3] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[3], ReconPtr[3]);
DctInputPtr[4] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[4], ReconPtr[4]);
DctInputPtr[5] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[5], ReconPtr[5]);
DctInputPtr[6] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[6], ReconPtr[6]);
DctInputPtr[7] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[7], ReconPtr[7]);
/* Start next row */
FiltPtr += PixelsPerLine;
ReconPtr += ReconPixelsPerLine;
DctInputPtr += 8;
}
}
static void sub8x8_128__c (unsigned char *FiltPtr, ogg_int16_t *DctInputPtr,
ogg_uint32_t PixelsPerLine) {
int i;
/* For each block row */
for (i=8; i; i--) {
/* INTRA mode so code raw image data */
/* We convert the data to 8 bit signed (by subtracting 128) as
this reduces the internal precision requirments in the DCT
transform. */
DctInputPtr[0] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[0], 128);
DctInputPtr[1] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[1], 128);
DctInputPtr[2] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[2], 128);
DctInputPtr[3] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[3], 128);
DctInputPtr[4] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[4], 128);
DctInputPtr[5] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[5], 128);
DctInputPtr[6] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[6], 128);
DctInputPtr[7] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[7], 128);
/* Start next row */
FiltPtr += PixelsPerLine;
DctInputPtr += 8;
}
}
static void sub8x8avg2__c (unsigned char *FiltPtr, unsigned char *ReconPtr1,
unsigned char *ReconPtr2, ogg_int16_t *DctInputPtr,
ogg_uint32_t PixelsPerLine,
ogg_uint32_t ReconPixelsPerLine)
{
int i;
/* For each block row */
for (i=8; i; i--) {
DctInputPtr[0] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[0], DSP_OP_AVG (ReconPtr1[0], ReconPtr2[0]));
DctInputPtr[1] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[1], DSP_OP_AVG (ReconPtr1[1], ReconPtr2[1]));
DctInputPtr[2] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[2], DSP_OP_AVG (ReconPtr1[2], ReconPtr2[2]));
DctInputPtr[3] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[3], DSP_OP_AVG (ReconPtr1[3], ReconPtr2[3]));
DctInputPtr[4] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[4], DSP_OP_AVG (ReconPtr1[4], ReconPtr2[4]));
DctInputPtr[5] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[5], DSP_OP_AVG (ReconPtr1[5], ReconPtr2[5]));
DctInputPtr[6] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[6], DSP_OP_AVG (ReconPtr1[6], ReconPtr2[6]));
DctInputPtr[7] = (ogg_int16_t) DSP_OP_DIFF (FiltPtr[7], DSP_OP_AVG (ReconPtr1[7], ReconPtr2[7]));
/* Start next row */
FiltPtr += PixelsPerLine;
ReconPtr1 += ReconPixelsPerLine;
ReconPtr2 += ReconPixelsPerLine;
DctInputPtr += 8;
}
}
static ogg_uint32_t row_sad8__c (unsigned char *Src1, unsigned char *Src2)
{
ogg_uint32_t SadValue;
ogg_uint32_t SadValue1;
SadValue = DSP_OP_ABS_DIFF (Src1[0], Src2[0]) +
DSP_OP_ABS_DIFF (Src1[1], Src2[1]) +
DSP_OP_ABS_DIFF (Src1[2], Src2[2]) +
DSP_OP_ABS_DIFF (Src1[3], Src2[3]);
SadValue1 = DSP_OP_ABS_DIFF (Src1[4], Src2[4]) +
DSP_OP_ABS_DIFF (Src1[5], Src2[5]) +
DSP_OP_ABS_DIFF (Src1[6], Src2[6]) +
DSP_OP_ABS_DIFF (Src1[7], Src2[7]);
SadValue = ( SadValue > SadValue1 ) ? SadValue : SadValue1;
return SadValue;
}
static ogg_uint32_t col_sad8x8__c (unsigned char *Src1, unsigned char *Src2,
ogg_uint32_t stride)
{
ogg_uint32_t SadValue[8] = {0,0,0,0,0,0,0,0};
ogg_uint32_t SadValue2[8] = {0,0,0,0,0,0,0,0};
ogg_uint32_t MaxSad = 0;
ogg_uint32_t i;
for ( i = 0; i < 4; i++ ){
SadValue[0] += abs(Src1[0] - Src2[0]);
SadValue[1] += abs(Src1[1] - Src2[1]);
SadValue[2] += abs(Src1[2] - Src2[2]);
SadValue[3] += abs(Src1[3] - Src2[3]);
SadValue[4] += abs(Src1[4] - Src2[4]);
SadValue[5] += abs(Src1[5] - Src2[5]);
SadValue[6] += abs(Src1[6] - Src2[6]);
SadValue[7] += abs(Src1[7] - Src2[7]);
Src1 += stride;
Src2 += stride;
}
for ( i = 0; i < 4; i++ ){
SadValue2[0] += abs(Src1[0] - Src2[0]);
SadValue2[1] += abs(Src1[1] - Src2[1]);
SadValue2[2] += abs(Src1[2] - Src2[2]);
SadValue2[3] += abs(Src1[3] - Src2[3]);
SadValue2[4] += abs(Src1[4] - Src2[4]);
SadValue2[5] += abs(Src1[5] - Src2[5]);
SadValue2[6] += abs(Src1[6] - Src2[6]);
SadValue2[7] += abs(Src1[7] - Src2[7]);
Src1 += stride;
Src2 += stride;
}
for ( i = 0; i < 8; i++ ){
if ( SadValue[i] > MaxSad )
MaxSad = SadValue[i];
if ( SadValue2[i] > MaxSad )
MaxSad = SadValue2[i];
}
return MaxSad;
}
static ogg_uint32_t sad8x8__c (unsigned char *ptr1, ogg_uint32_t stride1,
unsigned char *ptr2, ogg_uint32_t stride2)
{
ogg_uint32_t i;
ogg_uint32_t sad = 0;
for (i=8; i; i--) {
sad += DSP_OP_ABS_DIFF(ptr1[0], ptr2[0]);
sad += DSP_OP_ABS_DIFF(ptr1[1], ptr2[1]);
sad += DSP_OP_ABS_DIFF(ptr1[2], ptr2[2]);
sad += DSP_OP_ABS_DIFF(ptr1[3], ptr2[3]);
sad += DSP_OP_ABS_DIFF(ptr1[4], ptr2[4]);
sad += DSP_OP_ABS_DIFF(ptr1[5], ptr2[5]);
sad += DSP_OP_ABS_DIFF(ptr1[6], ptr2[6]);
sad += DSP_OP_ABS_DIFF(ptr1[7], ptr2[7]);
/* Step to next row of block. */
ptr1 += stride1;
ptr2 += stride2;
}
return sad;
}
static ogg_uint32_t sad8x8_thres__c (unsigned char *ptr1, ogg_uint32_t stride1,
unsigned char *ptr2, ogg_uint32_t stride2,
ogg_uint32_t thres)
{
ogg_uint32_t i;
ogg_uint32_t sad = 0;
for (i=8; i; i--) {
sad += DSP_OP_ABS_DIFF(ptr1[0], ptr2[0]);
sad += DSP_OP_ABS_DIFF(ptr1[1], ptr2[1]);
sad += DSP_OP_ABS_DIFF(ptr1[2], ptr2[2]);
sad += DSP_OP_ABS_DIFF(ptr1[3], ptr2[3]);
sad += DSP_OP_ABS_DIFF(ptr1[4], ptr2[4]);
sad += DSP_OP_ABS_DIFF(ptr1[5], ptr2[5]);
sad += DSP_OP_ABS_DIFF(ptr1[6], ptr2[6]);
sad += DSP_OP_ABS_DIFF(ptr1[7], ptr2[7]);
if (sad > thres )
break;
/* Step to next row of block. */
ptr1 += stride1;
ptr2 += stride2;
}
return sad;
}
static ogg_uint32_t sad8x8_xy2_thres__c (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr1,
unsigned char *RefDataPtr2, ogg_uint32_t RefStride,
ogg_uint32_t thres)
{
ogg_uint32_t i;
ogg_uint32_t sad = 0;
for (i=8; i; i--) {
sad += DSP_OP_ABS_DIFF(SrcData[0], DSP_OP_AVG (RefDataPtr1[0], RefDataPtr2[0]));
sad += DSP_OP_ABS_DIFF(SrcData[1], DSP_OP_AVG (RefDataPtr1[1], RefDataPtr2[1]));
sad += DSP_OP_ABS_DIFF(SrcData[2], DSP_OP_AVG (RefDataPtr1[2], RefDataPtr2[2]));
sad += DSP_OP_ABS_DIFF(SrcData[3], DSP_OP_AVG (RefDataPtr1[3], RefDataPtr2[3]));
sad += DSP_OP_ABS_DIFF(SrcData[4], DSP_OP_AVG (RefDataPtr1[4], RefDataPtr2[4]));
sad += DSP_OP_ABS_DIFF(SrcData[5], DSP_OP_AVG (RefDataPtr1[5], RefDataPtr2[5]));
sad += DSP_OP_ABS_DIFF(SrcData[6], DSP_OP_AVG (RefDataPtr1[6], RefDataPtr2[6]));
sad += DSP_OP_ABS_DIFF(SrcData[7], DSP_OP_AVG (RefDataPtr1[7], RefDataPtr2[7]));
if ( sad > thres )
break;
/* Step to next row of block. */
SrcData += SrcStride;
RefDataPtr1 += RefStride;
RefDataPtr2 += RefStride;
}
return sad;
}
static ogg_uint32_t intra8x8_err__c (unsigned char *DataPtr, ogg_uint32_t Stride)
{
ogg_uint32_t i;
ogg_uint32_t XSum=0;
ogg_uint32_t XXSum=0;
for (i=8; i; i--) {
/* Examine alternate pixel locations. */
XSum += DataPtr[0];
XXSum += DataPtr[0]*DataPtr[0];
XSum += DataPtr[1];
XXSum += DataPtr[1]*DataPtr[1];
XSum += DataPtr[2];
XXSum += DataPtr[2]*DataPtr[2];
XSum += DataPtr[3];
XXSum += DataPtr[3]*DataPtr[3];
XSum += DataPtr[4];
XXSum += DataPtr[4]*DataPtr[4];
XSum += DataPtr[5];
XXSum += DataPtr[5]*DataPtr[5];
XSum += DataPtr[6];
XXSum += DataPtr[6]*DataPtr[6];
XSum += DataPtr[7];
XXSum += DataPtr[7]*DataPtr[7];
/* Step to next row of block. */
DataPtr += Stride;
}
/* Compute population variance as mis-match metric. */
return (( (XXSum<<6) - XSum*XSum ) );
}
static ogg_uint32_t inter8x8_err__c (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr, ogg_uint32_t RefStride)
{
ogg_uint32_t i;
ogg_uint32_t XSum=0;
ogg_uint32_t XXSum=0;
ogg_int32_t DiffVal;
for (i=8; i; i--) {
DiffVal = DSP_OP_DIFF (SrcData[0], RefDataPtr[0]);
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF (SrcData[1], RefDataPtr[1]);
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF (SrcData[2], RefDataPtr[2]);
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF (SrcData[3], RefDataPtr[3]);
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF (SrcData[4], RefDataPtr[4]);
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF (SrcData[5], RefDataPtr[5]);
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF (SrcData[6], RefDataPtr[6]);
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF (SrcData[7], RefDataPtr[7]);
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
/* Step to next row of block. */
SrcData += SrcStride;
RefDataPtr += RefStride;
}
/* Compute and return population variance as mis-match metric. */
return (( (XXSum<<6) - XSum*XSum ));
}
static ogg_uint32_t inter8x8_err_xy2__c (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr1,
unsigned char *RefDataPtr2, ogg_uint32_t RefStride)
{
ogg_uint32_t i;
ogg_uint32_t XSum=0;
ogg_uint32_t XXSum=0;
ogg_int32_t DiffVal;
for (i=8; i; i--) {
DiffVal = DSP_OP_DIFF(SrcData[0], DSP_OP_AVG (RefDataPtr1[0], RefDataPtr2[0]));
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF(SrcData[1], DSP_OP_AVG (RefDataPtr1[1], RefDataPtr2[1]));
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF(SrcData[2], DSP_OP_AVG (RefDataPtr1[2], RefDataPtr2[2]));
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF(SrcData[3], DSP_OP_AVG (RefDataPtr1[3], RefDataPtr2[3]));
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF(SrcData[4], DSP_OP_AVG (RefDataPtr1[4], RefDataPtr2[4]));
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF(SrcData[5], DSP_OP_AVG (RefDataPtr1[5], RefDataPtr2[5]));
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF(SrcData[6], DSP_OP_AVG (RefDataPtr1[6], RefDataPtr2[6]));
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
DiffVal = DSP_OP_DIFF(SrcData[7], DSP_OP_AVG (RefDataPtr1[7], RefDataPtr2[7]));
XSum += DiffVal;
XXSum += DiffVal*DiffVal;
/* Step to next row of block. */
SrcData += SrcStride;
RefDataPtr1 += RefStride;
RefDataPtr2 += RefStride;
}
/* Compute and return population variance as mis-match metric. */
return (( (XXSum<<6) - XSum*XSum ));
}
static void nop (void) { /* NOP */ }
void dsp_init(DspFunctions *funcs)
{
funcs->save_fpu = nop;
funcs->restore_fpu = nop;
funcs->sub8x8 = sub8x8__c;
funcs->sub8x8_128 = sub8x8_128__c;
funcs->sub8x8avg2 = sub8x8avg2__c;
funcs->row_sad8 = row_sad8__c;
funcs->col_sad8x8 = col_sad8x8__c;
funcs->sad8x8 = sad8x8__c;
funcs->sad8x8_thres = sad8x8_thres__c;
funcs->sad8x8_xy2_thres = sad8x8_xy2_thres__c;
funcs->intra8x8_err = intra8x8_err__c;
funcs->inter8x8_err = inter8x8_err__c;
funcs->inter8x8_err_xy2 = inter8x8_err_xy2__c;
}
void dsp_static_init(DspFunctions *funcs)
{
ogg_uint32_t cpuflags;
cpuflags = oc_cpu_flags_get ();
dsp_init (funcs);
dsp_recon_init (funcs, cpuflags);
dsp_dct_init (funcs, cpuflags);
#if defined(USE_ASM)
if (cpuflags & OC_CPU_X86_MMX) {
dsp_mmx_init(funcs);
}
# ifndef WIN32
/* This is implemented for win32 yet */
if (cpuflags & OC_CPU_X86_MMXEXT) {
dsp_mmxext_init(funcs);
}
# endif
#endif
}

View file

@ -1,166 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dsp.h 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
#ifndef DSP_H
#define DSP_H
#include "theora/theora.h"
#include "../cpu.h"
typedef struct
{
void (*save_fpu) (void);
void (*restore_fpu) (void);
void (*sub8x8) (unsigned char *FiltPtr, unsigned char *ReconPtr,
ogg_int16_t *DctInputPtr, ogg_uint32_t PixelsPerLine,
ogg_uint32_t ReconPixelsPerLine);
void (*sub8x8_128) (unsigned char *FiltPtr, ogg_int16_t *DctInputPtr,
ogg_uint32_t PixelsPerLine);
void (*sub8x8avg2) (unsigned char *FiltPtr, unsigned char *ReconPtr1,
unsigned char *ReconPtr2, ogg_int16_t *DctInputPtr,
ogg_uint32_t PixelsPerLine,
ogg_uint32_t ReconPixelsPerLine);
void (*copy8x8) (unsigned char *src, unsigned char *dest,
ogg_uint32_t stride);
void (*recon_intra8x8) (unsigned char *ReconPtr, ogg_int16_t *ChangePtr,
ogg_uint32_t LineStep);
void (*recon_inter8x8) (unsigned char *ReconPtr, unsigned char *RefPtr,
ogg_int16_t *ChangePtr, ogg_uint32_t LineStep);
void (*recon_inter8x8_half) (unsigned char *ReconPtr, unsigned char *RefPtr1,
unsigned char *RefPtr2, ogg_int16_t *ChangePtr,
ogg_uint32_t LineStep);
void (*fdct_short) (ogg_int16_t *InputData, ogg_int16_t *OutputData);
ogg_uint32_t (*row_sad8) (unsigned char *Src1, unsigned char *Src2);
ogg_uint32_t (*col_sad8x8) (unsigned char *Src1, unsigned char *Src2,
ogg_uint32_t stride);
ogg_uint32_t (*sad8x8) (unsigned char *ptr1, ogg_uint32_t stride1,
unsigned char *ptr2, ogg_uint32_t stride2);
ogg_uint32_t (*sad8x8_thres) (unsigned char *ptr1, ogg_uint32_t stride1,
unsigned char *ptr2, ogg_uint32_t stride2,
ogg_uint32_t thres);
ogg_uint32_t (*sad8x8_xy2_thres)(unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr1,
unsigned char *RefDataPtr2, ogg_uint32_t RefStride,
ogg_uint32_t thres);
ogg_uint32_t (*intra8x8_err) (unsigned char *DataPtr, ogg_uint32_t Stride);
ogg_uint32_t (*inter8x8_err) (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr, ogg_uint32_t RefStride);
ogg_uint32_t (*inter8x8_err_xy2)(unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr1,
unsigned char *RefDataPtr2, ogg_uint32_t RefStride);
void (*LoopFilter) (PB_INSTANCE *pbi, int FLimit);
void (*FilterVert) (unsigned char * PixelPtr,
ogg_int32_t LineLength, ogg_int16_t *BoundingValuePtr);
void (*IDctSlow) (ogg_int16_t *InputData,
ogg_int16_t *QuantMatrix, ogg_int16_t *OutputData);
void (*IDct3) (ogg_int16_t *InputData,
ogg_int16_t *QuantMatrix, ogg_int16_t *OutputData);
void (*IDct10) (ogg_int16_t *InputData,
ogg_int16_t *QuantMatrix, ogg_int16_t *OutputData);
} DspFunctions;
extern void dsp_dct_init(DspFunctions *funcs, ogg_uint32_t cpu_flags);
extern void dsp_recon_init (DspFunctions *funcs, ogg_uint32_t cpu_flags);
extern void dsp_dct_decode_init(DspFunctions *funcs, ogg_uint32_t cpu_flags);
extern void dsp_idct_init(DspFunctions *funcs, ogg_uint32_t cpu_flags);
void dsp_init(DspFunctions *funcs);
void dsp_static_init(DspFunctions *funcs);
#if defined(USE_ASM) && (defined(__i386__) || defined(__x86_64__) || defined(WIN32))
extern void dsp_mmx_init(DspFunctions *funcs);
extern void dsp_mmxext_init(DspFunctions *funcs);
extern void dsp_mmx_fdct_init(DspFunctions *funcs);
extern void dsp_mmx_recon_init(DspFunctions *funcs);
extern void dsp_mmx_dct_decode_init(DspFunctions *funcs);
extern void dsp_mmx_idct_init(DspFunctions *funcs);
#endif
#define dsp_save_fpu(funcs) (funcs.save_fpu ())
#define dsp_restore_fpu(funcs) (funcs.restore_fpu ())
#define dsp_sub8x8(funcs,a1,a2,a3,a4,a5) (funcs.sub8x8 (a1,a2,a3,a4,a5))
#define dsp_sub8x8_128(funcs,a1,a2,a3) (funcs.sub8x8_128 (a1,a2,a3))
#define dsp_sub8x8avg2(funcs,a1,a2,a3,a4,a5,a6) (funcs.sub8x8avg2 (a1,a2,a3,a4,a5,a6))
#define dsp_copy8x8(funcs,ptr1,ptr2,str1) (funcs.copy8x8 (ptr1,ptr2,str1))
#define dsp_recon_intra8x8(funcs,ptr1,ptr2,str1) (funcs.recon_intra8x8 (ptr1,ptr2,str1))
#define dsp_recon_inter8x8(funcs,ptr1,ptr2,ptr3,str1) \
(funcs.recon_inter8x8 (ptr1,ptr2,ptr3,str1))
#define dsp_recon_inter8x8_half(funcs,ptr1,ptr2,ptr3,ptr4,str1) \
(funcs.recon_inter8x8_half (ptr1,ptr2,ptr3,ptr4,str1))
#define dsp_fdct_short(funcs,in,out) (funcs.fdct_short (in,out))
#define dsp_row_sad8(funcs,ptr1,ptr2) (funcs.row_sad8 (ptr1,ptr2))
#define dsp_col_sad8x8(funcs,ptr1,ptr2,str1) (funcs.col_sad8x8 (ptr1,ptr2,str1))
#define dsp_sad8x8(funcs,ptr1,str1,ptr2,str2) (funcs.sad8x8 (ptr1,str1,ptr2,str2))
#define dsp_sad8x8_thres(funcs,ptr1,str1,ptr2,str2,t) (funcs.sad8x8_thres (ptr1,str1,ptr2,str2,t))
#define dsp_sad8x8_xy2_thres(funcs,ptr1,str1,ptr2,ptr3,str2,t) \
(funcs.sad8x8_xy2_thres (ptr1,str1,ptr2,ptr3,str2,t))
#define dsp_intra8x8_err(funcs,ptr1,str1) (funcs.intra8x8_err (ptr1,str1))
#define dsp_inter8x8_err(funcs,ptr1,str1,ptr2,str2) \
(funcs.inter8x8_err (ptr1,str1,ptr2,str2))
#define dsp_inter8x8_err_xy2(funcs,ptr1,str1,ptr2,ptr3,str2) \
(funcs.inter8x8_err_xy2 (ptr1,str1,ptr2,ptr3,str2))
#define dsp_LoopFilter(funcs, ptr1, i) \
(funcs.LoopFilter(ptr1, i))
#define dsp_IDctSlow(funcs, ptr1, ptr2, ptr3) \
(funcs.IDctSlow(ptr1, ptr2, ptr3))
#define dsp_IDct3(funcs, ptr1, ptr2, ptr3) \
(funcs.IDctSlow(ptr1, ptr2, ptr3))
#define dsp_IDct10(funcs, ptr1, ptr2, ptr3) \
(funcs.IDctSlow(ptr1, ptr2, ptr3))
#endif /* DSP_H */

File diff suppressed because it is too large Load diff

View file

@ -1,310 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: encoder_huffman.c 13884 2007-09-22 08:38:10Z giles $
********************************************************************/
#include <stdlib.h>
#include <stdio.h>
#include "codec_internal.h"
#include "hufftables.h"
static void CreateHuffmanList(HUFF_ENTRY ** HuffRoot,
ogg_uint32_t HIndex,
const ogg_uint32_t *FreqList ) {
int i;
HUFF_ENTRY *entry_ptr;
HUFF_ENTRY *search_ptr;
/* Create a HUFF entry for token zero. */
HuffRoot[HIndex] = (HUFF_ENTRY *)_ogg_calloc(1,sizeof(*HuffRoot[HIndex]));
HuffRoot[HIndex]->Previous = NULL;
HuffRoot[HIndex]->Next = NULL;
HuffRoot[HIndex]->ZeroChild = NULL;
HuffRoot[HIndex]->OneChild = NULL;
HuffRoot[HIndex]->Value = 0;
HuffRoot[HIndex]->Frequency = FreqList[0];
if ( HuffRoot[HIndex]->Frequency == 0 )
HuffRoot[HIndex]->Frequency = 1;
/* Now add entries for all the other possible tokens. */
for ( i = 1; i < MAX_ENTROPY_TOKENS; i++ ) {
entry_ptr = (HUFF_ENTRY *)_ogg_calloc(1,sizeof(*entry_ptr));
entry_ptr->Value = i;
entry_ptr->Frequency = FreqList[i];
entry_ptr->ZeroChild = NULL;
entry_ptr->OneChild = NULL;
/* Force min value of 1. This prevents the tree getting too deep. */
if ( entry_ptr->Frequency == 0 )
entry_ptr->Frequency = 1;
if ( entry_ptr->Frequency <= HuffRoot[HIndex]->Frequency ){
entry_ptr->Next = HuffRoot[HIndex];
HuffRoot[HIndex]->Previous = entry_ptr;
entry_ptr->Previous = NULL;
HuffRoot[HIndex] = entry_ptr;
}else{
search_ptr = HuffRoot[HIndex];
while ( (search_ptr->Next != NULL) &&
(search_ptr->Frequency < entry_ptr->Frequency) ){
search_ptr = (HUFF_ENTRY *)search_ptr->Next;
}
if ( search_ptr->Frequency < entry_ptr->Frequency ){
entry_ptr->Next = NULL;
entry_ptr->Previous = search_ptr;
search_ptr->Next = entry_ptr;
}else{
entry_ptr->Next = search_ptr;
entry_ptr->Previous = search_ptr->Previous;
search_ptr->Previous->Next = entry_ptr;
search_ptr->Previous = entry_ptr;
}
}
}
}
static void CreateCodeArray( HUFF_ENTRY * HuffRoot,
ogg_uint32_t *HuffCodeArray,
unsigned char *HuffCodeLengthArray,
ogg_uint32_t CodeValue,
unsigned char CodeLength ) {
/* If we are at a leaf then fill in a code array entry. */
if ( ( HuffRoot->ZeroChild == NULL ) && ( HuffRoot->OneChild == NULL ) ){
HuffCodeArray[HuffRoot->Value] = CodeValue;
HuffCodeLengthArray[HuffRoot->Value] = CodeLength;
}else{
/* Recursive calls to scan down the tree. */
CodeLength++;
CreateCodeArray(HuffRoot->ZeroChild, HuffCodeArray, HuffCodeLengthArray,
((CodeValue << 1) + 0), CodeLength);
CreateCodeArray(HuffRoot->OneChild, HuffCodeArray, HuffCodeLengthArray,
((CodeValue << 1) + 1), CodeLength);
}
}
static void BuildHuffmanTree( HUFF_ENTRY **HuffRoot,
ogg_uint32_t *HuffCodeArray,
unsigned char *HuffCodeLengthArray,
ogg_uint32_t HIndex,
const ogg_uint32_t *FreqList ){
HUFF_ENTRY *entry_ptr;
HUFF_ENTRY *search_ptr;
/* First create a sorted linked list representing the frequencies of
each token. */
CreateHuffmanList( HuffRoot, HIndex, FreqList );
/* Now build the tree from the list. */
/* While there are at least two items left in the list. */
while ( HuffRoot[HIndex]->Next != NULL ){
/* Create the new node as the parent of the first two in the list. */
entry_ptr = (HUFF_ENTRY *)_ogg_calloc(1,sizeof(*entry_ptr));
entry_ptr->Value = -1;
entry_ptr->Frequency = HuffRoot[HIndex]->Frequency +
HuffRoot[HIndex]->Next->Frequency ;
entry_ptr->ZeroChild = HuffRoot[HIndex];
entry_ptr->OneChild = HuffRoot[HIndex]->Next;
/* If there are still more items in the list then insert the new
node into the list. */
if (entry_ptr->OneChild->Next != NULL ){
/* Set up the provisional 'new root' */
HuffRoot[HIndex] = entry_ptr->OneChild->Next;
HuffRoot[HIndex]->Previous = NULL;
/* Now scan through the remaining list to insert the new entry
at the appropriate point. */
if ( entry_ptr->Frequency <= HuffRoot[HIndex]->Frequency ){
entry_ptr->Next = HuffRoot[HIndex];
HuffRoot[HIndex]->Previous = entry_ptr;
entry_ptr->Previous = NULL;
HuffRoot[HIndex] = entry_ptr;
}else{
search_ptr = HuffRoot[HIndex];
while ( (search_ptr->Next != NULL) &&
(search_ptr->Frequency < entry_ptr->Frequency) ){
search_ptr = search_ptr->Next;
}
if ( search_ptr->Frequency < entry_ptr->Frequency ){
entry_ptr->Next = NULL;
entry_ptr->Previous = search_ptr;
search_ptr->Next = entry_ptr;
}else{
entry_ptr->Next = search_ptr;
entry_ptr->Previous = search_ptr->Previous;
search_ptr->Previous->Next = entry_ptr;
search_ptr->Previous = entry_ptr;
}
}
}else{
/* Build has finished. */
entry_ptr->Next = NULL;
entry_ptr->Previous = NULL;
HuffRoot[HIndex] = entry_ptr;
}
/* Delete the Next/Previous properties of the children (PROB NOT NEC). */
entry_ptr->ZeroChild->Next = NULL;
entry_ptr->ZeroChild->Previous = NULL;
entry_ptr->OneChild->Next = NULL;
entry_ptr->OneChild->Previous = NULL;
}
/* Now build a code array from the tree. */
CreateCodeArray( HuffRoot[HIndex], HuffCodeArray,
HuffCodeLengthArray, 0, 0);
}
static void DestroyHuffTree(HUFF_ENTRY *root_ptr){
if (root_ptr){
if ( root_ptr->ZeroChild )
DestroyHuffTree(root_ptr->ZeroChild);
if ( root_ptr->OneChild )
DestroyHuffTree(root_ptr->OneChild);
_ogg_free(root_ptr);
}
}
void ClearHuffmanSet( PB_INSTANCE *pbi ){
int i;
ClearHuffmanTrees(pbi->HuffRoot_VP3x);
for ( i = 0; i < NUM_HUFF_TABLES; i++ )
if (pbi->HuffCodeArray_VP3x[i])
_ogg_free (pbi->HuffCodeArray_VP3x[i]);
for ( i = 0; i < NUM_HUFF_TABLES; i++ )
if (pbi->HuffCodeLengthArray_VP3x[i])
_ogg_free (pbi->HuffCodeLengthArray_VP3x[i]);
}
void InitHuffmanSet( PB_INSTANCE *pbi ){
int i;
ClearHuffmanSet(pbi);
pbi->ExtraBitLengths_VP3x = ExtraBitLengths_VP31;
for ( i = 0; i < NUM_HUFF_TABLES; i++ ){
pbi->HuffCodeArray_VP3x[i] =
_ogg_calloc(MAX_ENTROPY_TOKENS,
sizeof(*pbi->HuffCodeArray_VP3x[i]));
pbi->HuffCodeLengthArray_VP3x[i] =
_ogg_calloc(MAX_ENTROPY_TOKENS,
sizeof(*pbi->HuffCodeLengthArray_VP3x[i]));
BuildHuffmanTree( pbi->HuffRoot_VP3x,
pbi->HuffCodeArray_VP3x[i],
pbi->HuffCodeLengthArray_VP3x[i],
i, FrequencyCounts_VP3[i]);
}
}
static int ReadHuffTree(HUFF_ENTRY * HuffRoot, int depth,
oggpack_buffer *opb) {
long bit;
long ret;
theora_read(opb,1,&bit);
if(bit < 0) return OC_BADHEADER;
else if(!bit) {
int ret;
if (++depth > 32) return OC_BADHEADER;
HuffRoot->ZeroChild = (HUFF_ENTRY *)_ogg_calloc(1, sizeof(HUFF_ENTRY));
ret = ReadHuffTree(HuffRoot->ZeroChild, depth, opb);
if (ret < 0) return ret;
HuffRoot->OneChild = (HUFF_ENTRY *)_ogg_calloc(1, sizeof(HUFF_ENTRY));
ret = ReadHuffTree(HuffRoot->OneChild, depth, opb);
if (ret < 0) return ret;
HuffRoot->Value = -1;
} else {
HuffRoot->ZeroChild = NULL;
HuffRoot->OneChild = NULL;
theora_read(opb,5,&ret);
HuffRoot->Value=ret;;
if (HuffRoot->Value < 0) return OC_BADHEADER;
}
return 0;
}
int ReadHuffmanTrees(codec_setup_info *ci, oggpack_buffer *opb) {
int i;
for (i=0; i<NUM_HUFF_TABLES; i++) {
int ret;
ci->HuffRoot[i] = (HUFF_ENTRY *)_ogg_calloc(1, sizeof(HUFF_ENTRY));
ret = ReadHuffTree(ci->HuffRoot[i], 0, opb);
if (ret) return ret;
}
return 0;
}
static void WriteHuffTree(HUFF_ENTRY *HuffRoot, oggpack_buffer *opb) {
if (HuffRoot->Value >= 0) {
oggpackB_write(opb, 1, 1);
oggpackB_write(opb, HuffRoot->Value, 5);
} else {
oggpackB_write(opb, 0, 1);
WriteHuffTree(HuffRoot->ZeroChild, opb);
WriteHuffTree(HuffRoot->OneChild, opb);
}
}
void WriteHuffmanTrees(HUFF_ENTRY *HuffRoot[NUM_HUFF_TABLES],
oggpack_buffer *opb) {
int i;
for(i=0; i<NUM_HUFF_TABLES; i++) {
WriteHuffTree(HuffRoot[i], opb);
}
}
static HUFF_ENTRY *CopyHuffTree(const HUFF_ENTRY *HuffSrc) {
if(HuffSrc){
HUFF_ENTRY *HuffDst;
HuffDst = (HUFF_ENTRY *)_ogg_calloc(1, sizeof(HUFF_ENTRY));
HuffDst->Value = HuffSrc->Value;
if (HuffSrc->Value < 0) {
HuffDst->ZeroChild = CopyHuffTree(HuffSrc->ZeroChild);
HuffDst->OneChild = CopyHuffTree(HuffSrc->OneChild);
}
return HuffDst;
}
return NULL;
}
void InitHuffmanTrees(PB_INSTANCE *pbi, const codec_setup_info *ci) {
int i;
pbi->ExtraBitLengths_VP3x = ExtraBitLengths_VP31;
for(i=0; i<NUM_HUFF_TABLES; i++){
pbi->HuffRoot_VP3x[i] = CopyHuffTree(ci->HuffRoot[i]);
}
}
void ClearHuffmanTrees(HUFF_ENTRY *HuffRoot[NUM_HUFF_TABLES]){
int i;
for(i=0; i<NUM_HUFF_TABLES; i++) {
DestroyHuffTree(HuffRoot[i]);
HuffRoot[i] = NULL;
}
}

View file

@ -1,74 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: encoder_huffman.h 13884 2007-09-22 08:38:10Z giles $
********************************************************************/
/********************************************************************
* Constants
********************************************************************/
#define NUM_HUFF_TABLES 80
#define DC_HUFF_OFFSET 0
#define AC_HUFF_OFFSET 16
#define AC_TABLE_2_THRESH 5
#define AC_TABLE_3_THRESH 14
#define AC_TABLE_4_THRESH 27
#define DC_HUFF_CHOICES 16
#define DC_HUFF_CHOICE_BITS 4
#define AC_HUFF_CHOICES 16
#define AC_HUFF_CHOICE_BITS 4
/* Constants assosciated with entropy tokenisation. */
#define MAX_SINGLE_TOKEN_VALUE 6
#define DCT_VAL_CAT2_MIN 3
#define DCT_VAL_CAT3_MIN 7
#define DCT_VAL_CAT4_MIN 9
#define DCT_VAL_CAT5_MIN 13
#define DCT_VAL_CAT6_MIN 21
#define DCT_VAL_CAT7_MIN 37
#define DCT_VAL_CAT8_MIN 69
#define DCT_EOB_TOKEN 0
#define DCT_EOB_PAIR_TOKEN 1
#define DCT_EOB_TRIPLE_TOKEN 2
#define DCT_REPEAT_RUN_TOKEN 3
#define DCT_REPEAT_RUN2_TOKEN 4
#define DCT_REPEAT_RUN3_TOKEN 5
#define DCT_REPEAT_RUN4_TOKEN 6
#define DCT_SHORT_ZRL_TOKEN 7
#define DCT_ZRL_TOKEN 8
#define ONE_TOKEN 9 /* Special tokens for -1,1,-2,2 */
#define MINUS_ONE_TOKEN 10
#define TWO_TOKEN 11
#define MINUS_TWO_TOKEN 12
#define LOW_VAL_TOKENS (MINUS_TWO_TOKEN + 1)
#define DCT_VAL_CATEGORY3 (LOW_VAL_TOKENS + 4)
#define DCT_VAL_CATEGORY4 (DCT_VAL_CATEGORY3 + 1)
#define DCT_VAL_CATEGORY5 (DCT_VAL_CATEGORY4 + 1)
#define DCT_VAL_CATEGORY6 (DCT_VAL_CATEGORY5 + 1)
#define DCT_VAL_CATEGORY7 (DCT_VAL_CATEGORY6 + 1)
#define DCT_VAL_CATEGORY8 (DCT_VAL_CATEGORY7 + 1)
#define DCT_RUN_CATEGORY1 (DCT_VAL_CATEGORY8 + 1)
#define DCT_RUN_CATEGORY1B (DCT_RUN_CATEGORY1 + 5)
#define DCT_RUN_CATEGORY1C (DCT_RUN_CATEGORY1B + 1)
#define DCT_RUN_CATEGORY2 (DCT_RUN_CATEGORY1C + 1)
/* 32 */
#define MAX_ENTROPY_TOKENS (DCT_RUN_CATEGORY2 + 2)

View file

@ -1,572 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function: C implementation of the Theora iDCT
last mod: $Id: encoder_idct.c 14714 2008-04-12 01:04:43Z giles $
********************************************************************/
#include <string.h>
#include "codec_internal.h"
#include "quant_lookup.h"
#define IdctAdjustBeforeShift 8
/* cos(n*pi/16) or sin(8-n)*pi/16) */
#define xC1S7 64277
#define xC2S6 60547
#define xC3S5 54491
#define xC4S4 46341
#define xC5S3 36410
#define xC6S2 25080
#define xC7S1 12785
/* compute the 16 bit signed 1D inverse DCT - spec version */
/*
static void idct_short__c ( ogg_int16_t * InputData, ogg_int16_t * OutputData ) {
ogg_int32_t t[8], r;
ogg_int16_t *y = InputData;
ogg_int16_t *x = OutputData;
t[0] = y[0] + y[4];
t[0] &= 0xffff;
t[0] = (xC4S4 * t[0]) >> 16;
t[1] = y[0] - y[4];
t[1] &= 0xffff;
t[1] = (xC4S4 * t[1]) >> 16;
t[2] = ((xC6S2 * y[2]) >> 16) - ((xC2S6 * y[6]) >> 16);
t[3] = ((xC2S6 * y[2]) >> 16) + ((xC6S2 * y[6]) >> 16);
t[4] = ((xC7S1 * y[1]) >> 16) - ((xC1S7 * y[7]) >> 16);
t[5] = ((xC3S5 * y[5]) >> 16) - ((xC5S3 * y[3]) >> 16);
t[6] = ((xC5S3 * y[5]) >> 16) + ((xC3S5 * y[3]) >> 16);
t[7] = ((xC1S7 * y[1]) >> 16) + ((xC7S1 * y[7]) >> 16);
r = t[4] + t[5];
t[5] = t[4] - t[5];
t[5] &= 0xffff;
t[5] = (xC4S4 * (-t[5])) >> 16;
t[4] = r;
r = t[7] + t[6];
t[6] = t[7] - t[6];
t[6] &= 0xffff;
t[6] = (xC4S4 * t[6]) >> 16;
t[7] = r;
r = t[0] + t[3];
t[3] = t[0] - t[3];
t[0] = r;
r = t[1] + t[2];
t[2] = t[1] - t[2];
t[1] = r;
r = t[6] + t[5];
t[5] = t[6] - t[5];
t[6] = r;
r = t[0] + t[7];
r &= 0xffff;
x[0] = r;
r = t[1] + t[6];
r &= 0xffff;
x[1] = r;
r = t[2] + t[5];
r &= 0xffff;
x[2] = r;
r = t[3] + t[4];
r &= 0xffff;
x[3] = r;
r = t[3] - t[4];
r &= 0xffff;
x[4] = r;
r = t[2] - t[5];
r &= 0xffff;
x[5] = r;
r = t[1] - t[6];
r &= 0xffff;
x[6] = r;
r = t[0] - t[7];
r &= 0xffff;
x[7] = r;
}
*/
static void dequant_slow( ogg_int16_t * dequant_coeffs,
ogg_int16_t * quantized_list,
ogg_int32_t * DCT_block) {
int i;
for(i=0;i<64;i++)
DCT_block[dezigzag_index[i]] = quantized_list[i] * dequant_coeffs[i];
}
void IDctSlow__c( Q_LIST_ENTRY * InputData,
ogg_int16_t *QuantMatrix,
ogg_int16_t * OutputData ) {
ogg_int32_t IntermediateData[64];
ogg_int32_t * ip = IntermediateData;
ogg_int16_t * op = OutputData;
ogg_int32_t _A, _B, _C, _D, _Ad, _Bd, _Cd, _Dd, _E, _F, _G, _H;
ogg_int32_t _Ed, _Gd, _Add, _Bdd, _Fd, _Hd;
ogg_int32_t t1, t2;
int loop;
dequant_slow( QuantMatrix, InputData, IntermediateData);
/* Inverse DCT on the rows now */
for ( loop = 0; loop < 8; loop++){
/* Check for non-zero values */
if ( ip[0] | ip[1] | ip[2] | ip[3] | ip[4] | ip[5] | ip[6] | ip[7] ) {
t1 = (xC1S7 * ip[1]);
t2 = (xC7S1 * ip[7]);
t1 >>= 16;
t2 >>= 16;
_A = t1 + t2;
t1 = (xC7S1 * ip[1]);
t2 = (xC1S7 * ip[7]);
t1 >>= 16;
t2 >>= 16;
_B = t1 - t2;
t1 = (xC3S5 * ip[3]);
t2 = (xC5S3 * ip[5]);
t1 >>= 16;
t2 >>= 16;
_C = t1 + t2;
t1 = (xC3S5 * ip[5]);
t2 = (xC5S3 * ip[3]);
t1 >>= 16;
t2 >>= 16;
_D = t1 - t2;
t1 = (xC4S4 * (ogg_int16_t)(_A - _C));
t1 >>= 16;
_Ad = t1;
t1 = (xC4S4 * (ogg_int16_t)(_B - _D));
t1 >>= 16;
_Bd = t1;
_Cd = _A + _C;
_Dd = _B + _D;
t1 = (xC4S4 * (ogg_int16_t)(ip[0] + ip[4]));
t1 >>= 16;
_E = t1;
t1 = (xC4S4 * (ogg_int16_t)(ip[0] - ip[4]));
t1 >>= 16;
_F = t1;
t1 = (xC2S6 * ip[2]);
t2 = (xC6S2 * ip[6]);
t1 >>= 16;
t2 >>= 16;
_G = t1 + t2;
t1 = (xC6S2 * ip[2]);
t2 = (xC2S6 * ip[6]);
t1 >>= 16;
t2 >>= 16;
_H = t1 - t2;
_Ed = _E - _G;
_Gd = _E + _G;
_Add = _F + _Ad;
_Bdd = _Bd - _H;
_Fd = _F - _Ad;
_Hd = _Bd + _H;
/* Final sequence of operations over-write original inputs. */
ip[0] = (ogg_int16_t)((_Gd + _Cd ) >> 0);
ip[7] = (ogg_int16_t)((_Gd - _Cd ) >> 0);
ip[1] = (ogg_int16_t)((_Add + _Hd ) >> 0);
ip[2] = (ogg_int16_t)((_Add - _Hd ) >> 0);
ip[3] = (ogg_int16_t)((_Ed + _Dd ) >> 0);
ip[4] = (ogg_int16_t)((_Ed - _Dd ) >> 0);
ip[5] = (ogg_int16_t)((_Fd + _Bdd ) >> 0);
ip[6] = (ogg_int16_t)((_Fd - _Bdd ) >> 0);
}
ip += 8; /* next row */
}
ip = IntermediateData;
for ( loop = 0; loop < 8; loop++){
/* Check for non-zero values (bitwise or faster than ||) */
if ( ip[0 * 8] | ip[1 * 8] | ip[2 * 8] | ip[3 * 8] |
ip[4 * 8] | ip[5 * 8] | ip[6 * 8] | ip[7 * 8] ) {
t1 = (xC1S7 * ip[1*8]);
t2 = (xC7S1 * ip[7*8]);
t1 >>= 16;
t2 >>= 16;
_A = t1 + t2;
t1 = (xC7S1 * ip[1*8]);
t2 = (xC1S7 * ip[7*8]);
t1 >>= 16;
t2 >>= 16;
_B = t1 - t2;
t1 = (xC3S5 * ip[3*8]);
t2 = (xC5S3 * ip[5*8]);
t1 >>= 16;
t2 >>= 16;
_C = t1 + t2;
t1 = (xC3S5 * ip[5*8]);
t2 = (xC5S3 * ip[3*8]);
t1 >>= 16;
t2 >>= 16;
_D = t1 - t2;
t1 = (xC4S4 * (ogg_int16_t)(_A - _C));
t1 >>= 16;
_Ad = t1;
t1 = (xC4S4 * (ogg_int16_t)(_B - _D));
t1 >>= 16;
_Bd = t1;
_Cd = _A + _C;
_Dd = _B + _D;
t1 = (xC4S4 * (ogg_int16_t)(ip[0*8] + ip[4*8]));
t1 >>= 16;
_E = t1;
t1 = (xC4S4 * (ogg_int16_t)(ip[0*8] - ip[4*8]));
t1 >>= 16;
_F = t1;
t1 = (xC2S6 * ip[2*8]);
t2 = (xC6S2 * ip[6*8]);
t1 >>= 16;
t2 >>= 16;
_G = t1 + t2;
t1 = (xC6S2 * ip[2*8]);
t2 = (xC2S6 * ip[6*8]);
t1 >>= 16;
t2 >>= 16;
_H = t1 - t2;
_Ed = _E - _G;
_Gd = _E + _G;
_Add = _F + _Ad;
_Bdd = _Bd - _H;
_Fd = _F - _Ad;
_Hd = _Bd + _H;
_Gd += IdctAdjustBeforeShift;
_Add += IdctAdjustBeforeShift;
_Ed += IdctAdjustBeforeShift;
_Fd += IdctAdjustBeforeShift;
/* Final sequence of operations over-write original inputs. */
op[0*8] = (ogg_int16_t)((_Gd + _Cd ) >> 4);
op[7*8] = (ogg_int16_t)((_Gd - _Cd ) >> 4);
op[1*8] = (ogg_int16_t)((_Add + _Hd ) >> 4);
op[2*8] = (ogg_int16_t)((_Add - _Hd ) >> 4);
op[3*8] = (ogg_int16_t)((_Ed + _Dd ) >> 4);
op[4*8] = (ogg_int16_t)((_Ed - _Dd ) >> 4);
op[5*8] = (ogg_int16_t)((_Fd + _Bdd ) >> 4);
op[6*8] = (ogg_int16_t)((_Fd - _Bdd ) >> 4);
}else{
op[0*8] = 0;
op[7*8] = 0;
op[1*8] = 0;
op[2*8] = 0;
op[3*8] = 0;
op[4*8] = 0;
op[5*8] = 0;
op[6*8] = 0;
}
ip++; /* next column */
op++;
}
}
/************************
x x x x 0 0 0 0
x x x 0 0 0 0 0
x x 0 0 0 0 0 0
x 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
*************************/
static void dequant_slow10( ogg_int16_t * dequant_coeffs,
ogg_int16_t * quantized_list,
ogg_int32_t * DCT_block){
int i;
memset(DCT_block,0, 128);
for(i=0;i<10;i++)
DCT_block[dezigzag_index[i]] = quantized_list[i] * dequant_coeffs[i];
}
void IDct10__c( Q_LIST_ENTRY * InputData,
ogg_int16_t *QuantMatrix,
ogg_int16_t * OutputData ){
ogg_int32_t IntermediateData[64];
ogg_int32_t * ip = IntermediateData;
ogg_int16_t * op = OutputData;
ogg_int32_t _A, _B, _C, _D, _Ad, _Bd, _Cd, _Dd, _E, _F, _G, _H;
ogg_int32_t _Ed, _Gd, _Add, _Bdd, _Fd, _Hd;
ogg_int32_t t1, t2;
int loop;
dequant_slow10( QuantMatrix, InputData, IntermediateData);
/* Inverse DCT on the rows now */
for ( loop = 0; loop < 4; loop++){
/* Check for non-zero values */
if ( ip[0] | ip[1] | ip[2] | ip[3] ){
t1 = (xC1S7 * ip[1]);
t1 >>= 16;
_A = t1;
t1 = (xC7S1 * ip[1]);
t1 >>= 16;
_B = t1 ;
t1 = (xC3S5 * ip[3]);
t1 >>= 16;
_C = t1;
t2 = (xC5S3 * ip[3]);
t2 >>= 16;
_D = -t2;
t1 = (xC4S4 * (ogg_int16_t)(_A - _C));
t1 >>= 16;
_Ad = t1;
t1 = (xC4S4 * (ogg_int16_t)(_B - _D));
t1 >>= 16;
_Bd = t1;
_Cd = _A + _C;
_Dd = _B + _D;
t1 = (xC4S4 * ip[0] );
t1 >>= 16;
_E = t1;
_F = t1;
t1 = (xC2S6 * ip[2]);
t1 >>= 16;
_G = t1;
t1 = (xC6S2 * ip[2]);
t1 >>= 16;
_H = t1 ;
_Ed = _E - _G;
_Gd = _E + _G;
_Add = _F + _Ad;
_Bdd = _Bd - _H;
_Fd = _F - _Ad;
_Hd = _Bd + _H;
/* Final sequence of operations over-write original inputs. */
ip[0] = (ogg_int16_t)((_Gd + _Cd ) >> 0);
ip[7] = (ogg_int16_t)((_Gd - _Cd ) >> 0);
ip[1] = (ogg_int16_t)((_Add + _Hd ) >> 0);
ip[2] = (ogg_int16_t)((_Add - _Hd ) >> 0);
ip[3] = (ogg_int16_t)((_Ed + _Dd ) >> 0);
ip[4] = (ogg_int16_t)((_Ed - _Dd ) >> 0);
ip[5] = (ogg_int16_t)((_Fd + _Bdd ) >> 0);
ip[6] = (ogg_int16_t)((_Fd - _Bdd ) >> 0);
}
ip += 8; /* next row */
}
ip = IntermediateData;
for ( loop = 0; loop < 8; loop++) {
/* Check for non-zero values (bitwise or faster than ||) */
if ( ip[0 * 8] | ip[1 * 8] | ip[2 * 8] | ip[3 * 8] ) {
t1 = (xC1S7 * ip[1*8]);
t1 >>= 16;
_A = t1 ;
t1 = (xC7S1 * ip[1*8]);
t1 >>= 16;
_B = t1 ;
t1 = (xC3S5 * ip[3*8]);
t1 >>= 16;
_C = t1 ;
t2 = (xC5S3 * ip[3*8]);
t2 >>= 16;
_D = - t2;
t1 = (xC4S4 * (ogg_int16_t)(_A - _C));
t1 >>= 16;
_Ad = t1;
t1 = (xC4S4 * (ogg_int16_t)(_B - _D));
t1 >>= 16;
_Bd = t1;
_Cd = _A + _C;
_Dd = _B + _D;
t1 = (xC4S4 * ip[0*8]);
t1 >>= 16;
_E = t1;
_F = t1;
t1 = (xC2S6 * ip[2*8]);
t1 >>= 16;
_G = t1;
t1 = (xC6S2 * ip[2*8]);
t1 >>= 16;
_H = t1;
_Ed = _E - _G;
_Gd = _E + _G;
_Add = _F + _Ad;
_Bdd = _Bd - _H;
_Fd = _F - _Ad;
_Hd = _Bd + _H;
_Gd += IdctAdjustBeforeShift;
_Add += IdctAdjustBeforeShift;
_Ed += IdctAdjustBeforeShift;
_Fd += IdctAdjustBeforeShift;
/* Final sequence of operations over-write original inputs. */
op[0*8] = (ogg_int16_t)((_Gd + _Cd ) >> 4);
op[7*8] = (ogg_int16_t)((_Gd - _Cd ) >> 4);
op[1*8] = (ogg_int16_t)((_Add + _Hd ) >> 4);
op[2*8] = (ogg_int16_t)((_Add - _Hd ) >> 4);
op[3*8] = (ogg_int16_t)((_Ed + _Dd ) >> 4);
op[4*8] = (ogg_int16_t)((_Ed - _Dd ) >> 4);
op[5*8] = (ogg_int16_t)((_Fd + _Bdd ) >> 4);
op[6*8] = (ogg_int16_t)((_Fd - _Bdd ) >> 4);
}else{
op[0*8] = 0;
op[7*8] = 0;
op[1*8] = 0;
op[2*8] = 0;
op[3*8] = 0;
op[4*8] = 0;
op[5*8] = 0;
op[6*8] = 0;
}
ip++; /* next column */
op++;
}
}
/***************************
x 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
**************************/
void IDct1( Q_LIST_ENTRY * InputData,
ogg_int16_t *QuantMatrix,
ogg_int16_t * OutputData ){
int loop;
ogg_int16_t OutD;
OutD=(ogg_int16_t) ((ogg_int32_t)(InputData[0]*QuantMatrix[0]+15)>>5);
for(loop=0;loop<64;loop++)
OutputData[loop]=OutD;
}
void dsp_idct_init (DspFunctions *funcs, ogg_uint32_t cpu_flags)
{
funcs->IDctSlow = IDctSlow__c;
funcs->IDct10 = IDct10__c;
funcs->IDct3 = IDct10__c;
#if defined(USE_ASM)
// todo: make mmx encoder idct for MSC one day...
#if !defined (_MSC_VER)
if (cpu_flags & OC_CPU_X86_MMX) {
dsp_mmx_idct_init(funcs);
}
#endif
#endif
}

View file

@ -1,120 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function: simple static lookups for VP3 frame encoder
last mod: $Id: encoder_lookup.h 15323 2008-09-19 19:43:59Z giles $
********************************************************************/
#include "codec_internal.h"
static const ogg_uint32_t MvPattern[(MAX_MV_EXTENT * 2) + 1] = {
0x000000ff, 0x000000fd, 0x000000fb, 0x000000f9,
0x000000f7, 0x000000f5, 0x000000f3, 0x000000f1,
0x000000ef, 0x000000ed, 0x000000eb, 0x000000e9,
0x000000e7, 0x000000e5, 0x000000e3, 0x000000e1,
0x0000006f, 0x0000006d, 0x0000006b, 0x00000069,
0x00000067, 0x00000065, 0x00000063, 0x00000061,
0x0000002f, 0x0000002d, 0x0000002b, 0x00000029,
0x00000009, 0x00000007, 0x00000002, 0x00000000,
0x00000001, 0x00000006, 0x00000008, 0x00000028,
0x0000002a, 0x0000002c, 0x0000002e, 0x00000060,
0x00000062, 0x00000064, 0x00000066, 0x00000068,
0x0000006a, 0x0000006c, 0x0000006e, 0x000000e0,
0x000000e2, 0x000000e4, 0x000000e6, 0x000000e8,
0x000000ea, 0x000000ec, 0x000000ee, 0x000000f0,
0x000000f2, 0x000000f4, 0x000000f6, 0x000000f8,
0x000000fa, 0x000000fc, 0x000000fe,
};
static const ogg_uint32_t MvBits[(MAX_MV_EXTENT * 2) + 1] = {
8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8,
7, 7, 7, 7, 7, 7, 7, 7,
6, 6, 6, 6, 4, 4, 3, 3,
3, 4, 4, 6, 6, 6, 6, 7,
7, 7, 7, 7, 7, 7, 7, 8,
8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8,
};
static const ogg_uint32_t MvPattern2[(MAX_MV_EXTENT * 2) + 1] = {
0x0000003f, 0x0000003d, 0x0000003b, 0x00000039,
0x00000037, 0x00000035, 0x00000033, 0x00000031,
0x0000002f, 0x0000002d, 0x0000002b, 0x00000029,
0x00000027, 0x00000025, 0x00000023, 0x00000021,
0x0000001f, 0x0000001d, 0x0000001b, 0x00000019,
0x00000017, 0x00000015, 0x00000013, 0x00000011,
0x0000000f, 0x0000000d, 0x0000000b, 0x00000009,
0x00000007, 0x00000005, 0x00000003, 0x00000000,
0x00000002, 0x00000004, 0x00000006, 0x00000008,
0x0000000a, 0x0000000c, 0x0000000e, 0x00000010,
0x00000012, 0x00000014, 0x00000016, 0x00000018,
0x0000001a, 0x0000001c, 0x0000001e, 0x00000020,
0x00000022, 0x00000024, 0x00000026, 0x00000028,
0x0000002a, 0x0000002c, 0x0000002e, 0x00000030,
0x00000032, 0x00000034, 0x00000036, 0x00000038,
0x0000003a, 0x0000003c, 0x0000003e,
};
static const ogg_uint32_t MvBits2[(MAX_MV_EXTENT * 2) + 1] = {
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6,
};
static const ogg_uint32_t ModeBitPatterns[MAX_MODES] = {
0x00, 0x02, 0x06, 0x0E, 0x1E, 0x3E, 0x7E, 0x7F };
static const ogg_int32_t ModeBitLengths[MAX_MODES] = {
1, 2, 3, 4, 5, 6, 7, 7 };
static const unsigned char ModeSchemes[MODE_METHODS-2][MAX_MODES] = {
/* Last Mv dominates */
{ 3, 4, 2, 0, 1, 5, 6, 7 }, /* L P M N I G GM 4 */
{ 2, 4, 3, 0, 1, 5, 6, 7 }, /* L P N M I G GM 4 */
{ 3, 4, 1, 0, 2, 5, 6, 7 }, /* L M P N I G GM 4 */
{ 2, 4, 1, 0, 3, 5, 6, 7 }, /* L M N P I G GM 4 */
/* No MV dominates */
{ 0, 4, 3, 1, 2, 5, 6, 7 }, /* N L P M I G GM 4 */
{ 0, 5, 4, 2, 3, 1, 6, 7 }, /* N G L P M I GM 4 */
};
static const ogg_uint32_t MvThreshTable[Q_TABLE_SIZE] = {
65, 65, 65, 65, 50, 50, 50, 50,
40, 40, 40, 40, 40, 40, 40, 40,
30, 30, 30, 30, 30, 30, 30, 30,
20, 20, 20, 20, 20, 20, 20, 20,
15, 15, 15, 15, 15, 15, 15, 15,
10, 10, 10, 10, 10, 10, 10, 10,
5, 5, 5, 5, 5, 5, 5, 5,
0, 0, 0, 0, 0, 0, 0, 0
};
static const ogg_uint32_t MVChangeFactorTable[Q_TABLE_SIZE] = {
11, 11, 11, 11, 12, 12, 12, 12,
13, 13, 13, 13, 13, 13, 13, 13,
14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 14, 14, 14,
15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15, 15, 15, 15, 15, 15
};

View file

@ -1,558 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2005 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: encoder_quant.c 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include "codec_internal.h"
#include "quant_lookup.h"
#define OC_QUANT_MAX (1024<<2)
static const unsigned DC_QUANT_MIN[2]={4<<2,8<<2};
static const unsigned AC_QUANT_MIN[2]={2<<2,4<<2};
#define OC_MAXI(_a,_b) ((_a)<(_b)?(_b):(_a))
#define OC_MINI(_a,_b) ((_a)>(_b)?(_b):(_a))
#define OC_CLAMPI(_a,_b,_c) (OC_MAXI(_a,OC_MINI(_b,_c)))
static int ilog(unsigned _v){
int ret;
for(ret=0;_v;ret++)_v>>=1;
return ret;
}
void WriteQTables(PB_INSTANCE *pbi,oggpack_buffer* _opb) {
th_quant_info *_qinfo = &pbi->quant_info;
const th_quant_ranges *qranges;
const th_quant_base *base_mats[2*3*64];
int indices[2][3][64];
int nbase_mats;
int nbits;
int ci;
int qi;
int qri;
int qti;
int pli;
int qtj;
int plj;
int bmi;
int i;
/*Unlike the scale tables, we can't assume the maximum value will be in
index 0, so search for it here.*/
i=_qinfo->loop_filter_limits[0];
for(qi=1;qi<64;qi++)i=OC_MAXI(i,_qinfo->loop_filter_limits[qi]);
nbits=ilog(i);
oggpackB_write(_opb,nbits,3);
for(qi=0;qi<64;qi++){
oggpackB_write(_opb,_qinfo->loop_filter_limits[qi],nbits);
}
/* 580 bits for VP3.*/
nbits=OC_MAXI(ilog(_qinfo->ac_scale[0]),1);
oggpackB_write(_opb,nbits-1,4);
for(qi=0;qi<64;qi++)oggpackB_write(_opb,_qinfo->ac_scale[qi],nbits);
/* 516 bits for VP3.*/
nbits=OC_MAXI(ilog(_qinfo->dc_scale[0]),1);
oggpackB_write(_opb,nbits-1,4);
for(qi=0;qi<64;qi++)oggpackB_write(_opb,_qinfo->dc_scale[qi],nbits);
/*Consolidate any duplicate base matrices.*/
nbase_mats=0;
for(qti=0;qti<2;qti++)for(pli=0;pli<3;pli++){
qranges=_qinfo->qi_ranges[qti]+pli;
for(qri=0;qri<=qranges->nranges;qri++){
for(bmi=0;;bmi++){
if(bmi>=nbase_mats){
base_mats[bmi]=qranges->base_matrices+qri;
indices[qti][pli][qri]=nbase_mats++;
break;
}
else if(memcmp(base_mats[bmi][0],qranges->base_matrices[qri],
sizeof(base_mats[bmi][0]))==0){
indices[qti][pli][qri]=bmi;
break;
}
}
}
}
/*Write out the list of unique base matrices.
1545 bits for VP3 matrices.*/
oggpackB_write(_opb,nbase_mats-1,9);
for(bmi=0;bmi<nbase_mats;bmi++){
for(ci=0;ci<64;ci++)oggpackB_write(_opb,base_mats[bmi][0][ci],8);
}
/*Now store quant ranges and their associated indices into the base matrix
list.
46 bits for VP3 matrices.*/
nbits=ilog(nbase_mats-1);
for(i=0;i<6;i++){
qti=i/3;
pli=i%3;
qranges=_qinfo->qi_ranges[qti]+pli;
if(i>0){
if(qti>0){
if(qranges->nranges==_qinfo->qi_ranges[qti-1][pli].nranges&&
memcmp(qranges->sizes,_qinfo->qi_ranges[qti-1][pli].sizes,
qranges->nranges*sizeof(qranges->sizes[0]))==0&&
memcmp(indices[qti][pli],indices[qti-1][pli],
(qranges->nranges+1)*sizeof(indices[qti][pli][0]))==0){
oggpackB_write(_opb,1,2);
continue;
}
}
qtj=(i-1)/3;
plj=(i-1)%3;
if(qranges->nranges==_qinfo->qi_ranges[qtj][plj].nranges&&
memcmp(qranges->sizes,_qinfo->qi_ranges[qtj][plj].sizes,
qranges->nranges*sizeof(qranges->sizes[0]))==0&&
memcmp(indices[qti][pli],indices[qtj][plj],
(qranges->nranges+1)*sizeof(indices[qti][pli][0]))==0){
oggpackB_write(_opb,0,1+(qti>0));
continue;
}
oggpackB_write(_opb,1,1);
}
oggpackB_write(_opb,indices[qti][pli][0],nbits);
for(qi=qri=0;qi<63;qri++){
oggpackB_write(_opb,qranges->sizes[qri]-1,ilog(62-qi));
qi+=qranges->sizes[qri];
oggpackB_write(_opb,indices[qti][pli][qri+1],nbits);
}
}
}
/* a copied/reconciled version of derf's theora-exp code; redundancy
should be eliminated at some point */
void InitQTables( PB_INSTANCE *pbi ){
int qti; /* coding mode: intra or inter */
int pli; /* Y U V */
th_quant_info *qinfo = &pbi->quant_info;
pbi->QThreshTable = pbi->quant_info.ac_scale;
for(qti=0;qti<2;qti++){
for(pli=0;pli<3;pli++){
int qi; /* quality index */
int qri; /* range iterator */
for(qi=0,qri=0; qri<=qinfo->qi_ranges[qti][pli].nranges; qri++){
th_quant_base base;
ogg_uint32_t q;
int qi_start;
int qi_end;
int ci;
memcpy(base,qinfo->qi_ranges[qti][pli].base_matrices[qri],
sizeof(base));
qi_start=qi;
if(qri==qinfo->qi_ranges[qti][pli].nranges)
qi_end=qi+1;
else
qi_end=qi+qinfo->qi_ranges[qti][pli].sizes[qri];
/* Iterate over quality indicies in this range */
for(;;){
/*Scale DC the coefficient from the proper table.*/
q=((ogg_uint32_t)qinfo->dc_scale[qi]*base[0]/100)<<2;
q=OC_CLAMPI(DC_QUANT_MIN[qti],q,OC_QUANT_MAX);
pbi->quant_tables[qti][pli][qi][0]=(ogg_uint16_t)q;
/*Now scale AC coefficients from the proper table.*/
for(ci=1;ci<64;ci++){
q=((ogg_uint32_t)qinfo->ac_scale[qi]*base[ci]/100)<<2;
q=OC_CLAMPI(AC_QUANT_MIN[qti],q,OC_QUANT_MAX);
pbi->quant_tables[qti][pli][qi][ci]=(ogg_uint16_t)q;
}
if(++qi>=qi_end)break;
/*Interpolate the next base matrix.*/
for(ci=0;ci<64;ci++){
base[ci]=(unsigned char)
((2*((qi_end-qi)*qinfo->qi_ranges[qti][pli].base_matrices[qri][ci]+
(qi-qi_start)*qinfo->qi_ranges[qti][pli].base_matrices[qri+1][ci])
+qinfo->qi_ranges[qti][pli].sizes[qri])/
(2*qinfo->qi_ranges[qti][pli].sizes[qri]));
}
}
}
}
}
}
static void BuildZigZagIndex(PB_INSTANCE *pbi){
ogg_int32_t i,j;
/* invert the row to zigzag coeffient order lookup table */
for ( i = 0; i < BLOCK_SIZE; i++ ){
j = dezigzag_index[i];
pbi->zigzag_index[j] = i;
}
}
static void init_quantizer ( CP_INSTANCE *cpi,
unsigned char QIndex ){
int i;
double ZBinFactor;
double RoundingFactor;
double temp_fp_quant_coeffs;
double temp_fp_quant_round;
double temp_fp_ZeroBinSize;
PB_INSTANCE *pbi = &cpi->pb;
const ogg_uint16_t * temp_Y_coeffs;
const ogg_uint16_t * temp_U_coeffs;
const ogg_uint16_t * temp_V_coeffs;
const ogg_uint16_t * temp_Inter_Y_coeffs;
const ogg_uint16_t * temp_Inter_U_coeffs;
const ogg_uint16_t * temp_Inter_V_coeffs;
ogg_uint16_t scale_factor = cpi->pb.quant_info.ac_scale[QIndex];
/* Notes on setup of quantisers. The initial multiplication by
the scale factor is done in the ogg_int32_t domain to insure that the
precision in the quantiser is the same as in the inverse
quantiser where all calculations are integer. The "<< 2" is a
normalisation factor for the forward DCT transform. */
temp_Y_coeffs = pbi->quant_tables[0][0][QIndex];
temp_U_coeffs = pbi->quant_tables[0][1][QIndex];
temp_V_coeffs = pbi->quant_tables[0][2][QIndex];
temp_Inter_Y_coeffs = pbi->quant_tables[1][0][QIndex];
temp_Inter_U_coeffs = pbi->quant_tables[1][1][QIndex];
temp_Inter_V_coeffs = pbi->quant_tables[1][2][QIndex];
ZBinFactor = 0.9;
switch(cpi->pb.info.sharpness){
case 0:
ZBinFactor = 0.65;
if ( scale_factor <= 50 )
RoundingFactor = 0.499;
else
RoundingFactor = 0.46;
break;
case 1:
ZBinFactor = 0.75;
if ( scale_factor <= 50 )
RoundingFactor = 0.476;
else
RoundingFactor = 0.400;
break;
default:
ZBinFactor = 0.9;
if ( scale_factor <= 50 )
RoundingFactor = 0.476;
else
RoundingFactor = 0.333;
break;
}
/* Use fixed multiplier for intra Y DC */
temp_fp_quant_coeffs = temp_Y_coeffs[0];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_Y_round[0] = (ogg_int32_t) (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_Y[0] = (ogg_int32_t) (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs = 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_Y_coeffs[0] = (0.5 + SHIFT16 * temp_fp_quant_coeffs);
/* Intra U */
temp_fp_quant_coeffs = temp_U_coeffs[0];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_U_round[0] = (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_U[0] = (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs = 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_U_coeffs[0]= (0.5 + SHIFT16 * temp_fp_quant_coeffs);
/* Intra V */
temp_fp_quant_coeffs = temp_V_coeffs[0];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_V_round[0] = (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_V[0] = (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs = 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_V_coeffs[0]= (0.5 + SHIFT16 * temp_fp_quant_coeffs);
/* Inter Y */
temp_fp_quant_coeffs = temp_Inter_Y_coeffs[0];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_Inter_Y_round[0]= (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_Inter_Y[0]= (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs= 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_Inter_Y_coeffs[0]= (0.5 + SHIFT16 * temp_fp_quant_coeffs);
/* Inter U */
temp_fp_quant_coeffs = temp_Inter_U_coeffs[0];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_Inter_U_round[0]= (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_Inter_U[0]= (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs= 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_Inter_U_coeffs[0]= (0.5 + SHIFT16 * temp_fp_quant_coeffs);
/* Inter V */
temp_fp_quant_coeffs = temp_Inter_V_coeffs[0];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_Inter_V_round[0]= (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_Inter_V[0]= (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs= 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_Inter_V_coeffs[0]= (0.5 + SHIFT16 * temp_fp_quant_coeffs);
for ( i = 1; i < 64; i++ ){
/* Intra Y */
temp_fp_quant_coeffs = temp_Y_coeffs[i];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_Y_round[i] = (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_Y[i] = (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs = 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_Y_coeffs[i] = (0.5 + SHIFT16 * temp_fp_quant_coeffs);
/* Intra U */
temp_fp_quant_coeffs = temp_U_coeffs[i];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_U_round[i] = (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_U[i] = (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs = 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_U_coeffs[i]= (0.5 + SHIFT16 * temp_fp_quant_coeffs);
/* Intra V */
temp_fp_quant_coeffs = temp_V_coeffs[i];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_V_round[i] = (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_V[i] = (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs = 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_V_coeffs[i]= (0.5 + SHIFT16 * temp_fp_quant_coeffs);
/* Inter Y */
temp_fp_quant_coeffs = temp_Inter_Y_coeffs[i];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_Inter_Y_round[i]= (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_Inter_Y[i]= (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs = 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_Inter_Y_coeffs[i]= (0.5 + SHIFT16 * temp_fp_quant_coeffs);
/* Inter U */
temp_fp_quant_coeffs = temp_Inter_U_coeffs[i];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_Inter_U_round[i]= (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_Inter_U[i]= (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs = 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_Inter_U_coeffs[i]= (0.5 + SHIFT16 * temp_fp_quant_coeffs);
/* Inter V */
temp_fp_quant_coeffs = temp_Inter_V_coeffs[i];
temp_fp_quant_round = temp_fp_quant_coeffs * RoundingFactor;
pbi->fp_quant_Inter_V_round[i]= (0.5 + temp_fp_quant_round);
temp_fp_ZeroBinSize = temp_fp_quant_coeffs * ZBinFactor;
pbi->fp_ZeroBinSize_Inter_V[i]= (0.5 + temp_fp_ZeroBinSize);
temp_fp_quant_coeffs = 1.0 / temp_fp_quant_coeffs;
pbi->fp_quant_Inter_V_coeffs[i]= (0.5 + SHIFT16 * temp_fp_quant_coeffs);
}
pbi->fquant_coeffs = pbi->fp_quant_Y_coeffs;
}
void select_quantiser(PB_INSTANCE *pbi, int type) {
/* select a quantiser according to what plane has to be coded in what
* mode. Could be extended to a more sophisticated scheme. */
switch(type) {
case BLOCK_Y:
pbi->fquant_coeffs = pbi->fp_quant_Y_coeffs;
pbi->fquant_round = pbi->fp_quant_Y_round;
pbi->fquant_ZbSize = pbi->fp_ZeroBinSize_Y;
break;
case BLOCK_U:
pbi->fquant_coeffs = pbi->fp_quant_U_coeffs;
pbi->fquant_round = pbi->fp_quant_U_round;
pbi->fquant_ZbSize = pbi->fp_ZeroBinSize_U;
break;
case BLOCK_V:
pbi->fquant_coeffs = pbi->fp_quant_V_coeffs;
pbi->fquant_round = pbi->fp_quant_V_round;
pbi->fquant_ZbSize = pbi->fp_ZeroBinSize_V;
break;
case BLOCK_INTER_Y:
pbi->fquant_coeffs = pbi->fp_quant_Inter_Y_coeffs;
pbi->fquant_round = pbi->fp_quant_Inter_Y_round;
pbi->fquant_ZbSize = pbi->fp_ZeroBinSize_Inter_Y;
break;
case BLOCK_INTER_U:
pbi->fquant_coeffs = pbi->fp_quant_Inter_U_coeffs;
pbi->fquant_round = pbi->fp_quant_Inter_U_round;
pbi->fquant_ZbSize = pbi->fp_ZeroBinSize_Inter_U;
break;
case BLOCK_INTER_V:
pbi->fquant_coeffs = pbi->fp_quant_Inter_V_coeffs;
pbi->fquant_round = pbi->fp_quant_Inter_V_round;
pbi->fquant_ZbSize = pbi->fp_ZeroBinSize_Inter_V;
break;
}
}
void quantize( PB_INSTANCE *pbi,
ogg_int16_t * DCT_block,
Q_LIST_ENTRY * quantized_list){
ogg_uint32_t i; /* Row index */
Q_LIST_ENTRY val; /* Quantised value. */
ogg_int32_t * FquantRoundPtr = pbi->fquant_round;
ogg_int32_t * FquantCoeffsPtr = pbi->fquant_coeffs;
ogg_int32_t * FquantZBinSizePtr = pbi->fquant_ZbSize;
ogg_int16_t * DCT_blockPtr = DCT_block;
ogg_uint32_t * ZigZagPtr = (ogg_uint32_t *)pbi->zigzag_index;
ogg_int32_t temp;
/* Set the quantized_list to default to 0 */
memset( quantized_list, 0, 64 * sizeof(Q_LIST_ENTRY) );
/* Note that we add half divisor to effect rounding on positive number */
for( i = 0; i < VFRAGPIXELS; i++) {
int col;
/* Iterate through columns */
for( col = 0; col < 8; col++) {
if ( DCT_blockPtr[col] >= FquantZBinSizePtr[col] ) {
temp = FquantCoeffsPtr[col] * ( DCT_blockPtr[col] + FquantRoundPtr[col] ) ;
val = (Q_LIST_ENTRY) (temp>>16);
quantized_list[ZigZagPtr[col]] = ( val > 511 ) ? 511 : val;
} else if ( DCT_blockPtr[col] <= -FquantZBinSizePtr[col] ) {
temp = FquantCoeffsPtr[col] *
( DCT_blockPtr[col] - FquantRoundPtr[col] ) + MIN16;
val = (Q_LIST_ENTRY) (temp>>16);
quantized_list[ZigZagPtr[col]] = ( val < -511 ) ? -511 : val;
}
}
FquantRoundPtr += 8;
FquantCoeffsPtr += 8;
FquantZBinSizePtr += 8;
DCT_blockPtr += 8;
ZigZagPtr += 8;
}
}
static void init_dequantizer ( PB_INSTANCE *pbi,
unsigned char QIndex ){
int i, j;
ogg_uint16_t * InterY_coeffs;
ogg_uint16_t * InterU_coeffs;
ogg_uint16_t * InterV_coeffs;
ogg_uint16_t * Y_coeffs;
ogg_uint16_t * U_coeffs;
ogg_uint16_t * V_coeffs;
Y_coeffs = pbi->quant_tables[0][0][QIndex];
U_coeffs = pbi->quant_tables[0][1][QIndex];
V_coeffs = pbi->quant_tables[0][2][QIndex];
InterY_coeffs = pbi->quant_tables[1][0][QIndex];
InterU_coeffs = pbi->quant_tables[1][1][QIndex];
InterV_coeffs = pbi->quant_tables[1][2][QIndex];
/* invert the dequant index into the quant index
the dxer has a different order than the cxer. */
BuildZigZagIndex(pbi);
/* Reorder dequantisation coefficients into dct zigzag order. */
for ( i = 0; i < BLOCK_SIZE; i++ ) {
j = pbi->zigzag_index[i];
pbi->dequant_Y_coeffs[j] = Y_coeffs[i];
}
for ( i = 0; i < BLOCK_SIZE; i++ ) {
j = pbi->zigzag_index[i];
pbi->dequant_U_coeffs[j] = U_coeffs[i];
}
for ( i = 0; i < BLOCK_SIZE; i++ ) {
j = pbi->zigzag_index[i];
pbi->dequant_V_coeffs[j] = V_coeffs[i];
}
for ( i = 0; i < BLOCK_SIZE; i++ ){
j = pbi->zigzag_index[i];
pbi->dequant_InterY_coeffs[j] = InterY_coeffs[i];
}
for ( i = 0; i < BLOCK_SIZE; i++ ){
j = pbi->zigzag_index[i];
pbi->dequant_InterU_coeffs[j] = InterU_coeffs[i];
}
for ( i = 0; i < BLOCK_SIZE; i++ ){
j = pbi->zigzag_index[i];
pbi->dequant_InterV_coeffs[j] = InterV_coeffs[i];
}
pbi->dequant_coeffs = pbi->dequant_Y_coeffs;
}
void UpdateQ( PB_INSTANCE *pbi, int NewQIndex ){
ogg_uint32_t qscale;
/* clamp to legal bounds */
if (NewQIndex >= Q_TABLE_SIZE) NewQIndex = Q_TABLE_SIZE - 1;
else if (NewQIndex < 0) NewQIndex = 0;
pbi->FrameQIndex = NewQIndex;
qscale = pbi->quant_info.ac_scale[NewQIndex];
pbi->ThisFrameQualityValue = qscale;
/* Re-initialise the Q tables for forward and reverse transforms. */
init_dequantizer ( pbi, (unsigned char) pbi->FrameQIndex );
}
void UpdateQC( CP_INSTANCE *cpi, ogg_uint32_t NewQ ){
ogg_uint32_t qscale;
PB_INSTANCE *pbi = &cpi->pb;
/* Do bounds checking and convert to a float. */
qscale = NewQ;
if ( qscale < pbi->quant_info.ac_scale[Q_TABLE_SIZE-1] )
qscale = pbi->quant_info.ac_scale[Q_TABLE_SIZE-1];
else if ( qscale > pbi->quant_info.ac_scale[0] )
qscale = pbi->quant_info.ac_scale[0];
/* Set the inter/intra descision control variables. */
pbi->FrameQIndex = Q_TABLE_SIZE - 1;
while ((ogg_int32_t) pbi->FrameQIndex >= 0 ) {
if ( (pbi->FrameQIndex == 0) ||
( pbi->quant_info.ac_scale[pbi->FrameQIndex] >= NewQ) )
break;
pbi->FrameQIndex --;
}
/* Re-initialise the Q tables for forward and reverse transforms. */
init_quantizer ( cpi, pbi->FrameQIndex );
init_dequantizer ( pbi, pbi->FrameQIndex );
}

File diff suppressed because it is too large Load diff

View file

@ -1,243 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: frarray.c 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
#include <string.h>
#include "codec_internal.h"
#include "block_inline.h"
/* Long run bit string coding */
static ogg_uint32_t FrArrayCodeSBRun( CP_INSTANCE *cpi, ogg_uint32_t value){
ogg_uint32_t CodedVal = 0;
ogg_uint32_t CodedBits = 0;
/* Coding scheme:
Codeword RunLength
0 1
10x 2-3
110x 4-5
1110xx 6-9
11110xxx 10-17
111110xxxx 18-33
111111xxxxxxxxxxxx 34-4129 */
if ( value == 1 ){
CodedVal = 0;
CodedBits = 1;
} else if ( value <= 3 ) {
CodedVal = 0x0004 + (value - 2);
CodedBits = 3;
} else if ( value <= 5 ) {
CodedVal = 0x000C + (value - 4);
CodedBits = 4;
} else if ( value <= 9 ) {
CodedVal = 0x0038 + (value - 6);
CodedBits = 6;
} else if ( value <= 17 ) {
CodedVal = 0x00F0 + (value - 10);
CodedBits = 8;
} else if ( value <= 33 ) {
CodedVal = 0x03E0 + (value - 18);
CodedBits = 10;
} else {
CodedVal = 0x3F000 + (value - 34);
CodedBits = 18;
}
/* Add the bits to the encode holding buffer. */
oggpackB_write( cpi->oggbuffer, CodedVal, CodedBits );
return CodedBits;
}
/* Short run bit string coding */
static ogg_uint32_t FrArrayCodeBlockRun( CP_INSTANCE *cpi,
ogg_uint32_t value ) {
ogg_uint32_t CodedVal = 0;
ogg_uint32_t CodedBits = 0;
/* Coding scheme:
Codeword RunLength
0x 1-2
10x 3-4
110x 5-6
1110xx 7-10
11110xx 11-14
11111xxxx 15-30 */
if ( value <= 2 ) {
CodedVal = value - 1;
CodedBits = 2;
} else if ( value <= 4 ) {
CodedVal = 0x0004 + (value - 3);
CodedBits = 3;
} else if ( value <= 6 ) {
CodedVal = 0x000C + (value - 5);
CodedBits = 4;
} else if ( value <= 10 ) {
CodedVal = 0x0038 + (value - 7);
CodedBits = 6;
} else if ( value <= 14 ) {
CodedVal = 0x0078 + (value - 11);
CodedBits = 7;
} else {
CodedVal = 0x01F0 + (value - 15);
CodedBits = 9;
}
/* Add the bits to the encode holding buffer. */
oggpackB_write( cpi->oggbuffer, CodedVal, CodedBits );
return CodedBits;
}
void PackAndWriteDFArray( CP_INSTANCE *cpi ){
ogg_uint32_t i;
unsigned char val;
ogg_uint32_t run_count;
ogg_uint32_t SB, MB, B; /* Block, MB and SB loop variables */
ogg_uint32_t BListIndex = 0;
ogg_uint32_t LastSbBIndex = 0;
ogg_int32_t DfBlockIndex; /* Block index in display_fragments */
/* Initialise workspaces */
memset( cpi->pb.SBFullyFlags, 1, cpi->pb.SuperBlocks);
memset( cpi->pb.SBCodedFlags, 0, cpi->pb.SuperBlocks );
memset( cpi->PartiallyCodedFlags, 0, cpi->pb.SuperBlocks );
memset( cpi->BlockCodedFlags, 0, cpi->pb.UnitFragments);
for( SB = 0; SB < cpi->pb.SuperBlocks; SB++ ) {
/* Check for coded blocks and macro-blocks */
for ( MB=0; MB<4; MB++ ) {
/* If MB in frame */
if ( QuadMapToMBTopLeft(cpi->pb.BlockMap,SB,MB) >= 0 ) {
for ( B=0; B<4; B++ ) {
DfBlockIndex = QuadMapToIndex1( cpi->pb.BlockMap,SB, MB, B );
/* Does Block lie in frame: */
if ( DfBlockIndex >= 0 ) {
/* In Frame: If it is not coded then this SB is only
partly coded.: */
if ( cpi->pb.display_fragments[DfBlockIndex] ) {
cpi->pb.SBCodedFlags[SB] = 1; /* SB at least partly coded */
cpi->BlockCodedFlags[BListIndex] = 1; /* Block is coded */
}else{
cpi->pb.SBFullyFlags[SB] = 0; /* SB not fully coded */
cpi->BlockCodedFlags[BListIndex] = 0; /* Block is not coded */
}
BListIndex++;
}
}
}
}
/* Is the SB fully coded or uncoded.
If so then backup BListIndex and MBListIndex */
if ( cpi->pb.SBFullyFlags[SB] || !cpi->pb.SBCodedFlags[SB] ) {
BListIndex = LastSbBIndex; /* Reset to values from previous SB */
}else{
cpi->PartiallyCodedFlags[SB] = 1; /* Set up list of partially
coded SBs */
LastSbBIndex = BListIndex;
}
}
/* Code list of partially coded Super-Block. */
val = cpi->PartiallyCodedFlags[0];
oggpackB_write( cpi->oggbuffer, (ogg_uint32_t)val, 1);
i = 0;
while ( i < cpi->pb.SuperBlocks ) {
run_count = 0;
while ( (i<cpi->pb.SuperBlocks) &&
(cpi->PartiallyCodedFlags[i]==val) &&
run_count<4129 ) {
i++;
run_count++;
}
/* Code the run */
FrArrayCodeSBRun( cpi, run_count);
if(run_count >= 4129 && i < cpi->pb.SuperBlocks ){
val = cpi->PartiallyCodedFlags[i];
oggpackB_write( cpi->oggbuffer, (ogg_uint32_t)val, 1);
}else
val = ( val == 0 ) ? 1 : 0;
}
/* RLC Super-Block fully/not coded. */
i = 0;
/* Skip partially coded blocks */
while( (i < cpi->pb.SuperBlocks) && cpi->PartiallyCodedFlags[i] )
i++;
if ( i < cpi->pb.SuperBlocks ) {
val = cpi->pb.SBFullyFlags[i];
oggpackB_write( cpi->oggbuffer, (ogg_uint32_t)val, 1);
while ( i < cpi->pb.SuperBlocks ) {
run_count = 0;
while ( (i < cpi->pb.SuperBlocks) &&
(cpi->pb.SBFullyFlags[i] == val) &&
run_count < 4129) {
i++;
/* Skip partially coded blocks */
while( (i < cpi->pb.SuperBlocks) && cpi->PartiallyCodedFlags[i] )
i++;
run_count++;
}
/* Code the run */
FrArrayCodeSBRun( cpi, run_count );
if(run_count >= 4129 && i < cpi->pb.SuperBlocks ){
val = cpi->PartiallyCodedFlags[i];
oggpackB_write( cpi->oggbuffer, (ogg_uint32_t)val, 1);
}else
val = ( val == 0 ) ? 1 : 0;
}
}
/* Now code the block flags */
if ( BListIndex > 0 ) {
/* Code the block flags start value */
val = cpi->BlockCodedFlags[0];
oggpackB_write( cpi->oggbuffer, (ogg_uint32_t)val, 1);
/* Now code the block flags. */
for ( i = 0; i < BListIndex; ) {
run_count = 0;
while ( (i < BListIndex) && (cpi->BlockCodedFlags[i] == val) ) {
i++;
run_count++;
}
FrArrayCodeBlockRun( cpi, run_count );
val = ( val == 0 ) ? 1 : 0;
}
}
}

View file

@ -1,392 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: frinit.c 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
#include <stdlib.h>
#include "codec_internal.h"
void InitializeFragCoordinates(PB_INSTANCE *pbi){
ogg_uint32_t i, j;
ogg_uint32_t HorizFrags = pbi->HFragments;
ogg_uint32_t VertFrags = pbi->VFragments;
ogg_uint32_t StartFrag = 0;
/* Y */
for(i = 0; i< VertFrags; i++){
for(j = 0; j< HorizFrags; j++){
ogg_uint32_t ThisFrag = i * HorizFrags + j;
pbi->FragCoordinates[ ThisFrag ].x=j * BLOCK_HEIGHT_WIDTH;
pbi->FragCoordinates[ ThisFrag ].y=i * BLOCK_HEIGHT_WIDTH;
}
}
/* U */
HorizFrags >>= 1;
VertFrags >>= 1;
StartFrag = pbi->YPlaneFragments;
for(i = 0; i< VertFrags; i++) {
for(j = 0; j< HorizFrags; j++) {
ogg_uint32_t ThisFrag = StartFrag + i * HorizFrags + j;
pbi->FragCoordinates[ ThisFrag ].x=j * BLOCK_HEIGHT_WIDTH;
pbi->FragCoordinates[ ThisFrag ].y=i * BLOCK_HEIGHT_WIDTH;
}
}
/* V */
StartFrag = pbi->YPlaneFragments + pbi->UVPlaneFragments;
for(i = 0; i< VertFrags; i++) {
for(j = 0; j< HorizFrags; j++) {
ogg_uint32_t ThisFrag = StartFrag + i * HorizFrags + j;
pbi->FragCoordinates[ ThisFrag ].x=j * BLOCK_HEIGHT_WIDTH;
pbi->FragCoordinates[ ThisFrag ].y=i * BLOCK_HEIGHT_WIDTH;
}
}
}
static void CalcPixelIndexTable( PB_INSTANCE *pbi){
ogg_uint32_t i;
ogg_uint32_t * PixelIndexTablePtr;
/* Calculate the pixel index table for normal image buffers */
PixelIndexTablePtr = pbi->pixel_index_table;
for ( i = 0; i < pbi->YPlaneFragments; i++ ) {
PixelIndexTablePtr[ i ] =
((i / pbi->HFragments) * VFRAGPIXELS *
pbi->info.width);
PixelIndexTablePtr[ i ] +=
((i % pbi->HFragments) * HFRAGPIXELS);
}
PixelIndexTablePtr = &pbi->pixel_index_table[pbi->YPlaneFragments];
for ( i = 0; i < ((pbi->HFragments >> 1) * pbi->VFragments); i++ ) {
PixelIndexTablePtr[ i ] =
((i / (pbi->HFragments / 2) ) *
(VFRAGPIXELS *
(pbi->info.width / 2)) );
PixelIndexTablePtr[ i ] +=
((i % (pbi->HFragments / 2) ) *
HFRAGPIXELS) + pbi->YPlaneSize;
}
/************************************************************************/
/* Now calculate the pixel index table for image reconstruction buffers */
PixelIndexTablePtr = pbi->recon_pixel_index_table;
for ( i = 0; i < pbi->YPlaneFragments; i++ ){
PixelIndexTablePtr[ i ] =
((i / pbi->HFragments) * VFRAGPIXELS *
pbi->YStride);
PixelIndexTablePtr[ i ] +=
((i % pbi->HFragments) * HFRAGPIXELS) +
pbi->ReconYDataOffset;
}
/* U blocks */
PixelIndexTablePtr = &pbi->recon_pixel_index_table[pbi->YPlaneFragments];
for ( i = 0; i < pbi->UVPlaneFragments; i++ ) {
PixelIndexTablePtr[ i ] =
((i / (pbi->HFragments / 2) ) *
(VFRAGPIXELS * (pbi->UVStride)) );
PixelIndexTablePtr[ i ] +=
((i % (pbi->HFragments / 2) ) *
HFRAGPIXELS) + pbi->ReconUDataOffset;
}
/* V blocks */
PixelIndexTablePtr =
&pbi->recon_pixel_index_table[pbi->YPlaneFragments +
pbi->UVPlaneFragments];
for ( i = 0; i < pbi->UVPlaneFragments; i++ ) {
PixelIndexTablePtr[ i ] =
((i / (pbi->HFragments / 2) ) *
(VFRAGPIXELS * (pbi->UVStride)) );
PixelIndexTablePtr[ i ] +=
((i % (pbi->HFragments / 2) ) * HFRAGPIXELS) +
pbi->ReconVDataOffset;
}
}
void ClearFragmentInfo(PB_INSTANCE * pbi){
/* free prior allocs if present */
if(pbi->display_fragments) _ogg_free(pbi->display_fragments);
if(pbi->pixel_index_table) _ogg_free(pbi->pixel_index_table);
if(pbi->recon_pixel_index_table) _ogg_free(pbi->recon_pixel_index_table);
if(pbi->FragTokenCounts) _ogg_free(pbi->FragTokenCounts);
if(pbi->CodedBlockList) _ogg_free(pbi->CodedBlockList);
if(pbi->FragMVect) _ogg_free(pbi->FragMVect);
if(pbi->FragCoeffs) _ogg_free(pbi->FragCoeffs);
if(pbi->FragCoefEOB) _ogg_free(pbi->FragCoefEOB);
if(pbi->skipped_display_fragments) _ogg_free(pbi->skipped_display_fragments);
if(pbi->QFragData) _ogg_free(pbi->QFragData);
if(pbi->TokenList) _ogg_free(pbi->TokenList);
if(pbi->FragCodingMethod) _ogg_free(pbi->FragCodingMethod);
if(pbi->FragCoordinates) _ogg_free(pbi->FragCoordinates);
if(pbi->FragQIndex) _ogg_free(pbi->FragQIndex);
if(pbi->PPCoefBuffer) _ogg_free(pbi->PPCoefBuffer);
if(pbi->FragmentVariances) _ogg_free(pbi->FragmentVariances);
if(pbi->BlockMap) _ogg_free(pbi->BlockMap);
if(pbi->SBCodedFlags) _ogg_free(pbi->SBCodedFlags);
if(pbi->SBFullyFlags) _ogg_free(pbi->SBFullyFlags);
if(pbi->MBFullyFlags) _ogg_free(pbi->MBFullyFlags);
if(pbi->MBCodedFlags) _ogg_free(pbi->MBCodedFlags);
if(pbi->_Nodes) _ogg_free(pbi->_Nodes);
pbi->_Nodes = 0;
pbi->QFragData = 0;
pbi->TokenList = 0;
pbi->skipped_display_fragments = 0;
pbi->FragCoeffs = 0;
pbi->FragCoefEOB = 0;
pbi->display_fragments = 0;
pbi->pixel_index_table = 0;
pbi->recon_pixel_index_table = 0;
pbi->FragTokenCounts = 0;
pbi->CodedBlockList = 0;
pbi->FragCodingMethod = 0;
pbi->FragMVect = 0;
pbi->MBCodedFlags = 0;
pbi->MBFullyFlags = 0;
pbi->BlockMap = 0;
pbi->SBCodedFlags = 0;
pbi->SBFullyFlags = 0;
pbi->QFragData = 0;
pbi->TokenList = 0;
pbi->skipped_display_fragments = 0;
pbi->FragCoeffs = 0;
pbi->FragCoefEOB = 0;
pbi->display_fragments = 0;
pbi->pixel_index_table = 0;
pbi->recon_pixel_index_table = 0;
pbi->FragTokenCounts = 0;
pbi->CodedBlockList = 0;
pbi->FragCodingMethod = 0;
pbi->FragCoordinates = 0;
pbi->FragMVect = 0;
pbi->PPCoefBuffer=0;
pbi->PPCoefBuffer=0;
pbi->FragQIndex = 0;
pbi->FragQIndex = 0;
pbi->FragmentVariances= 0;
pbi->FragmentVariances = 0 ;
}
void InitFragmentInfo(PB_INSTANCE * pbi){
/* clear any existing info */
ClearFragmentInfo(pbi);
/* Perform Fragment Allocations */
pbi->display_fragments =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->display_fragments));
pbi->pixel_index_table =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->pixel_index_table));
pbi->recon_pixel_index_table =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->recon_pixel_index_table));
pbi->FragTokenCounts =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->FragTokenCounts));
pbi->CodedBlockList =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->CodedBlockList));
pbi->FragMVect =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->FragMVect));
pbi->FragCoeffs =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->FragCoeffs));
pbi->FragCoefEOB =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->FragCoefEOB));
pbi->skipped_display_fragments =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->skipped_display_fragments));
pbi->QFragData =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->QFragData));
pbi->TokenList =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->TokenList));
pbi->FragCodingMethod =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->FragCodingMethod));
pbi->FragCoordinates =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->FragCoordinates));
pbi->FragQIndex =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->FragQIndex));
pbi->PPCoefBuffer =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->PPCoefBuffer));
pbi->FragmentVariances =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->FragmentVariances));
pbi->_Nodes =
_ogg_malloc(pbi->UnitFragments * sizeof(*pbi->_Nodes));
/* Super Block Initialization */
pbi->SBCodedFlags =
_ogg_malloc(pbi->SuperBlocks * sizeof(*pbi->SBCodedFlags));
pbi->SBFullyFlags =
_ogg_malloc(pbi->SuperBlocks * sizeof(*pbi->SBFullyFlags));
/* Macro Block Initialization */
pbi->MBCodedFlags =
_ogg_malloc(pbi->MacroBlocks * sizeof(*pbi->MBCodedFlags));
pbi->MBFullyFlags =
_ogg_malloc(pbi->MacroBlocks * sizeof(*pbi->MBFullyFlags));
pbi->BlockMap =
_ogg_malloc(pbi->SuperBlocks * sizeof(*pbi->BlockMap));
}
void ClearFrameInfo(PB_INSTANCE * pbi){
if(pbi->ThisFrameRecon )
_ogg_free(pbi->ThisFrameRecon );
if(pbi->GoldenFrame)
_ogg_free(pbi->GoldenFrame);
if(pbi->LastFrameRecon)
_ogg_free(pbi->LastFrameRecon);
if(pbi->PostProcessBuffer)
_ogg_free(pbi->PostProcessBuffer);
pbi->ThisFrameRecon = 0;
pbi->GoldenFrame = 0;
pbi->LastFrameRecon = 0;
pbi->PostProcessBuffer = 0;
pbi->ThisFrameRecon = 0;
pbi->GoldenFrame = 0;
pbi->LastFrameRecon = 0;
pbi->PostProcessBuffer = 0;
}
void InitFrameInfo(PB_INSTANCE * pbi, unsigned int FrameSize){
/* clear any existing info */
ClearFrameInfo(pbi);
/* allocate frames */
pbi->ThisFrameRecon =
_ogg_malloc(FrameSize*sizeof(*pbi->ThisFrameRecon));
pbi->GoldenFrame =
_ogg_malloc(FrameSize*sizeof(*pbi->GoldenFrame));
pbi->LastFrameRecon =
_ogg_malloc(FrameSize*sizeof(*pbi->LastFrameRecon));
pbi->PostProcessBuffer =
_ogg_malloc(FrameSize*sizeof(*pbi->PostProcessBuffer));
}
void InitFrameDetails(PB_INSTANCE *pbi){
int FrameSize;
/*pbi->PostProcessingLevel = 0;
pbi->PostProcessingLevel = 4;
pbi->PostProcessingLevel = 5;
pbi->PostProcessingLevel = 6;*/
pbi->PostProcessingLevel = 0;
/* Set the frame size etc. */
pbi->YPlaneSize = pbi->info.width *
pbi->info.height;
pbi->UVPlaneSize = pbi->YPlaneSize / 4;
pbi->HFragments = pbi->info.width / HFRAGPIXELS;
pbi->VFragments = pbi->info.height / VFRAGPIXELS;
pbi->UnitFragments = ((pbi->VFragments * pbi->HFragments)*3)/2;
pbi->YPlaneFragments = pbi->HFragments * pbi->VFragments;
pbi->UVPlaneFragments = pbi->YPlaneFragments / 4;
pbi->YStride = (pbi->info.width + STRIDE_EXTRA);
pbi->UVStride = pbi->YStride / 2;
pbi->ReconYPlaneSize = pbi->YStride *
(pbi->info.height + STRIDE_EXTRA);
pbi->ReconUVPlaneSize = pbi->ReconYPlaneSize / 4;
FrameSize = pbi->ReconYPlaneSize + 2 * pbi->ReconUVPlaneSize;
pbi->YDataOffset = 0;
pbi->UDataOffset = pbi->YPlaneSize;
pbi->VDataOffset = pbi->YPlaneSize + pbi->UVPlaneSize;
pbi->ReconYDataOffset =
(pbi->YStride * UMV_BORDER) + UMV_BORDER;
pbi->ReconUDataOffset = pbi->ReconYPlaneSize +
(pbi->UVStride * (UMV_BORDER/2)) + (UMV_BORDER/2);
pbi->ReconVDataOffset = pbi->ReconYPlaneSize + pbi->ReconUVPlaneSize +
(pbi->UVStride * (UMV_BORDER/2)) + (UMV_BORDER/2);
/* Image dimensions in Super-Blocks */
pbi->YSBRows = (pbi->info.height/32) +
( pbi->info.height%32 ? 1 : 0 );
pbi->YSBCols = (pbi->info.width/32) +
( pbi->info.width%32 ? 1 : 0 );
pbi->UVSBRows = ((pbi->info.height/2)/32) +
( (pbi->info.height/2)%32 ? 1 : 0 );
pbi->UVSBCols = ((pbi->info.width/2)/32) +
( (pbi->info.width/2)%32 ? 1 : 0 );
/* Super-Blocks per component */
pbi->YSuperBlocks = pbi->YSBRows * pbi->YSBCols;
pbi->UVSuperBlocks = pbi->UVSBRows * pbi->UVSBCols;
pbi->SuperBlocks = pbi->YSuperBlocks+2*pbi->UVSuperBlocks;
/* Useful externals */
pbi->MacroBlocks = ((pbi->VFragments+1)/2)*((pbi->HFragments+1)/2);
InitFragmentInfo(pbi);
InitFrameInfo(pbi, FrameSize);
InitializeFragCoordinates(pbi);
/* Configure mapping between quad-tree and fragments */
CreateBlockMapping ( pbi->BlockMap, pbi->YSuperBlocks,
pbi->UVSuperBlocks, pbi->HFragments, pbi->VFragments);
/* Re-initialise the pixel index table. */
CalcPixelIndexTable( pbi );
}

File diff suppressed because it is too large Load diff

View file

@ -1,767 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: mcomp.c 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
#include <stdlib.h>
#include <stdio.h>
#include "codec_internal.h"
/* Initialises motion compentsation. */
void InitMotionCompensation ( CP_INSTANCE *cpi ){
int i;
int SearchSite=0;
int Len;
int LineStepY = (ogg_int32_t)cpi->pb.YStride;
Len=((MAX_MV_EXTENT/2)+1)/2;
/* How many search stages are there. */
cpi->MVSearchSteps = 0;
/* Set up offsets arrays used in half pixel correction. */
cpi->HalfPixelRef2Offset[0] = -LineStepY - 1;
cpi->HalfPixelRef2Offset[1] = -LineStepY;
cpi->HalfPixelRef2Offset[2] = -LineStepY + 1;
cpi->HalfPixelRef2Offset[3] = - 1;
cpi->HalfPixelRef2Offset[4] = 0;
cpi->HalfPixelRef2Offset[5] = 1;
cpi->HalfPixelRef2Offset[6] = LineStepY - 1;
cpi->HalfPixelRef2Offset[7] = LineStepY;
cpi->HalfPixelRef2Offset[8] = LineStepY + 1;
cpi->HalfPixelXOffset[0] = -1;
cpi->HalfPixelXOffset[1] = 0;
cpi->HalfPixelXOffset[2] = 1;
cpi->HalfPixelXOffset[3] = -1;
cpi->HalfPixelXOffset[4] = 0;
cpi->HalfPixelXOffset[5] = 1;
cpi->HalfPixelXOffset[6] = -1;
cpi->HalfPixelXOffset[7] = 0;
cpi->HalfPixelXOffset[8] = 1;
cpi->HalfPixelYOffset[0] = -1;
cpi->HalfPixelYOffset[1] = -1;
cpi->HalfPixelYOffset[2] = -1;
cpi->HalfPixelYOffset[3] = 0;
cpi->HalfPixelYOffset[4] = 0;
cpi->HalfPixelYOffset[5] = 0;
cpi->HalfPixelYOffset[6] = 1;
cpi->HalfPixelYOffset[7] = 1;
cpi->HalfPixelYOffset[8] = 1;
/* Generate offsets for 8 search sites per step. */
while ( Len>0 ) {
/* Another step. */
cpi->MVSearchSteps += 1;
/* Compute offsets for search sites. */
cpi->MVOffsetX[SearchSite] = -Len;
cpi->MVOffsetY[SearchSite++] = -Len;
cpi->MVOffsetX[SearchSite] = 0;
cpi->MVOffsetY[SearchSite++] = -Len;
cpi->MVOffsetX[SearchSite] = Len;
cpi->MVOffsetY[SearchSite++] = -Len;
cpi->MVOffsetX[SearchSite] = -Len;
cpi->MVOffsetY[SearchSite++] = 0;
cpi->MVOffsetX[SearchSite] = Len;
cpi->MVOffsetY[SearchSite++] = 0;
cpi->MVOffsetX[SearchSite] = -Len;
cpi->MVOffsetY[SearchSite++] = Len;
cpi->MVOffsetX[SearchSite] = 0;
cpi->MVOffsetY[SearchSite++] = Len;
cpi->MVOffsetX[SearchSite] = Len;
cpi->MVOffsetY[SearchSite++] = Len;
/* Contract. */
Len /= 2;
}
/* Compute pixel index offsets. */
for ( i=SearchSite-1; i>=0; i-- )
cpi->MVPixelOffsetY[i] = (cpi->MVOffsetY[i]*LineStepY) + cpi->MVOffsetX[i];
}
static ogg_uint32_t GetInterErr (CP_INSTANCE *cpi, unsigned char * NewDataPtr,
unsigned char * RefDataPtr1,
unsigned char * RefDataPtr2,
ogg_uint32_t PixelsPerLine ) {
ogg_int32_t DiffVal;
ogg_int32_t RefOffset = (int)(RefDataPtr1 - RefDataPtr2);
ogg_uint32_t RefPixelsPerLine = PixelsPerLine + STRIDE_EXTRA;
/* Mode of interpolation chosen based upon on the offset of the
second reference pointer */
if ( RefOffset == 0 ) {
DiffVal = dsp_inter8x8_err (cpi->dsp, NewDataPtr, PixelsPerLine,
RefDataPtr1, RefPixelsPerLine);
}else{
DiffVal = dsp_inter8x8_err_xy2 (cpi->dsp, NewDataPtr, PixelsPerLine,
RefDataPtr1,
RefDataPtr2, RefPixelsPerLine);
}
/* Compute and return population variance as mis-match metric. */
return DiffVal;
}
static ogg_uint32_t GetHalfPixelSumAbsDiffs (CP_INSTANCE *cpi,
unsigned char * SrcData,
unsigned char * RefDataPtr1,
unsigned char * RefDataPtr2,
ogg_uint32_t PixelsPerLine,
ogg_uint32_t ErrorSoFar,
ogg_uint32_t BestSoFar ) {
ogg_uint32_t DiffVal = ErrorSoFar;
ogg_int32_t RefOffset = (int)(RefDataPtr1 - RefDataPtr2);
ogg_uint32_t RefPixelsPerLine = PixelsPerLine + STRIDE_EXTRA;
if ( RefOffset == 0 ) {
/* Simple case as for non 0.5 pixel */
DiffVal += dsp_sad8x8 (cpi->dsp, SrcData, PixelsPerLine,
RefDataPtr1, RefPixelsPerLine);
} else {
DiffVal += dsp_sad8x8_xy2_thres (cpi->dsp, SrcData, PixelsPerLine,
RefDataPtr1,
RefDataPtr2, RefPixelsPerLine, BestSoFar);
}
return DiffVal;
}
ogg_uint32_t GetMBIntraError (CP_INSTANCE *cpi, ogg_uint32_t FragIndex,
ogg_uint32_t PixelsPerLine ) {
ogg_uint32_t LocalFragIndex = FragIndex;
ogg_uint32_t IntraError = 0;
dsp_save_fpu (cpi->dsp);
/* Add together the intra errors for those blocks in the macro block
that are coded (Y only) */
if ( cpi->pb.display_fragments[LocalFragIndex] )
IntraError +=
dsp_intra8x8_err (cpi->dsp, &cpi->
ConvDestBuffer[cpi->pb.pixel_index_table[LocalFragIndex]],
PixelsPerLine);
LocalFragIndex++;
if ( cpi->pb.display_fragments[LocalFragIndex] )
IntraError +=
dsp_intra8x8_err (cpi->dsp, &cpi->
ConvDestBuffer[cpi->pb.pixel_index_table[LocalFragIndex]],
PixelsPerLine);
LocalFragIndex = FragIndex + cpi->pb.HFragments;
if ( cpi->pb.display_fragments[LocalFragIndex] )
IntraError +=
dsp_intra8x8_err (cpi->dsp, &cpi->
ConvDestBuffer[cpi->pb.pixel_index_table[LocalFragIndex]],
PixelsPerLine);
LocalFragIndex++;
if ( cpi->pb.display_fragments[LocalFragIndex] )
IntraError +=
dsp_intra8x8_err (cpi->dsp, &cpi->
ConvDestBuffer[cpi->pb.pixel_index_table[LocalFragIndex]],
PixelsPerLine);
dsp_restore_fpu (cpi->dsp);
return IntraError;
}
ogg_uint32_t GetMBInterError (CP_INSTANCE *cpi,
unsigned char * SrcPtr,
unsigned char * RefPtr,
ogg_uint32_t FragIndex,
ogg_int32_t LastXMV,
ogg_int32_t LastYMV,
ogg_uint32_t PixelsPerLine ) {
ogg_uint32_t RefPixelsPerLine = cpi->pb.YStride;
ogg_uint32_t LocalFragIndex = FragIndex;
ogg_int32_t PixelIndex;
ogg_int32_t RefPixelIndex;
ogg_int32_t RefPixelOffset;
ogg_int32_t RefPtr2Offset;
ogg_uint32_t InterError = 0;
unsigned char * SrcPtr1;
unsigned char * RefPtr1;
dsp_save_fpu (cpi->dsp);
/* Work out pixel offset into source buffer. */
PixelIndex = cpi->pb.pixel_index_table[LocalFragIndex];
/* Work out the pixel offset in reference buffer for the default
motion vector */
RefPixelIndex = cpi->pb.recon_pixel_index_table[LocalFragIndex];
RefPixelOffset = ((LastYMV/2) * RefPixelsPerLine) + (LastXMV/2);
/* Work out the second reference pointer offset. */
RefPtr2Offset = 0;
if ( LastXMV % 2 ) {
if ( LastXMV > 0 )
RefPtr2Offset += 1;
else
RefPtr2Offset -= 1;
}
if ( LastYMV % 2 ) {
if ( LastYMV > 0 )
RefPtr2Offset += RefPixelsPerLine;
else
RefPtr2Offset -= RefPixelsPerLine;
}
/* Add together the errors for those blocks in the macro block that
are coded (Y only) */
if ( cpi->pb.display_fragments[LocalFragIndex] ) {
SrcPtr1 = &SrcPtr[PixelIndex];
RefPtr1 = &RefPtr[RefPixelIndex + RefPixelOffset];
InterError += GetInterErr(cpi, SrcPtr1, RefPtr1,
&RefPtr1[RefPtr2Offset], PixelsPerLine );
}
LocalFragIndex++;
if ( cpi->pb.display_fragments[LocalFragIndex] ) {
PixelIndex = cpi->pb.pixel_index_table[LocalFragIndex];
RefPixelIndex = cpi->pb.recon_pixel_index_table[LocalFragIndex];
SrcPtr1 = &SrcPtr[PixelIndex];
RefPtr1 = &RefPtr[RefPixelIndex + RefPixelOffset];
InterError += GetInterErr(cpi, SrcPtr1, RefPtr1,
&RefPtr1[RefPtr2Offset], PixelsPerLine );
}
LocalFragIndex = FragIndex + cpi->pb.HFragments;
if ( cpi->pb.display_fragments[LocalFragIndex] ) {
PixelIndex = cpi->pb.pixel_index_table[LocalFragIndex];
RefPixelIndex = cpi->pb.recon_pixel_index_table[LocalFragIndex];
SrcPtr1 = &SrcPtr[PixelIndex];
RefPtr1 = &RefPtr[RefPixelIndex + RefPixelOffset];
InterError += GetInterErr(cpi, SrcPtr1, RefPtr1,
&RefPtr1[RefPtr2Offset], PixelsPerLine );
}
LocalFragIndex++;
if ( cpi->pb.display_fragments[LocalFragIndex] ) {
PixelIndex = cpi->pb.pixel_index_table[LocalFragIndex];
RefPixelIndex = cpi->pb.recon_pixel_index_table[LocalFragIndex];
SrcPtr1 = &SrcPtr[PixelIndex];
RefPtr1 = &RefPtr[RefPixelIndex + RefPixelOffset];
InterError += GetInterErr(cpi, SrcPtr1, RefPtr1,
&RefPtr1[RefPtr2Offset], PixelsPerLine );
}
dsp_restore_fpu (cpi->dsp);
return InterError;
}
ogg_uint32_t GetMBMVInterError (CP_INSTANCE *cpi,
unsigned char * RefFramePtr,
ogg_uint32_t FragIndex,
ogg_uint32_t PixelsPerLine,
ogg_int32_t *MVPixelOffset,
MOTION_VECTOR *MV ) {
ogg_uint32_t Error = 0;
ogg_uint32_t MinError;
ogg_uint32_t InterMVError = 0;
ogg_int32_t i;
ogg_int32_t x=0, y=0;
ogg_int32_t step;
ogg_int32_t SearchSite=0;
unsigned char *SrcPtr[4] = {NULL,NULL,NULL,NULL};
unsigned char *RefPtr=NULL;
unsigned char *CandidateBlockPtr=NULL;
unsigned char *BestBlockPtr=NULL;
ogg_uint32_t RefRow2Offset = cpi->pb.YStride * 8;
int MBlockDispFrags[4];
/* Half pixel variables */
ogg_int32_t HalfPixelError;
ogg_int32_t BestHalfPixelError;
unsigned char BestHalfOffset;
unsigned char * RefDataPtr1;
unsigned char * RefDataPtr2;
dsp_save_fpu (cpi->dsp);
/* Note which of the four blocks in the macro block are to be
included in the search. */
MBlockDispFrags[0] =
cpi->pb.display_fragments[FragIndex];
MBlockDispFrags[1] =
cpi->pb.display_fragments[FragIndex + 1];
MBlockDispFrags[2] =
cpi->pb.display_fragments[FragIndex + cpi->pb.HFragments];
MBlockDispFrags[3] =
cpi->pb.display_fragments[FragIndex + cpi->pb.HFragments + 1];
/* Set up the source pointers for the four source blocks. */
SrcPtr[0] = &cpi->ConvDestBuffer[cpi->pb.pixel_index_table[FragIndex]];
SrcPtr[1] = SrcPtr[0] + 8;
SrcPtr[2] = SrcPtr[0] + (PixelsPerLine * 8);
SrcPtr[3] = SrcPtr[2] + 8;
/* Set starting reference point for search. */
RefPtr = &RefFramePtr[cpi->pb.recon_pixel_index_table[FragIndex]];
/* Check the 0,0 candidate. */
if ( MBlockDispFrags[0] ) {
Error += dsp_sad8x8 (cpi->dsp, SrcPtr[0], PixelsPerLine, RefPtr,
PixelsPerLine + STRIDE_EXTRA);
}
if ( MBlockDispFrags[1] ) {
Error += dsp_sad8x8 (cpi->dsp, SrcPtr[1], PixelsPerLine, RefPtr + 8,
PixelsPerLine + STRIDE_EXTRA);
}
if ( MBlockDispFrags[2] ) {
Error += dsp_sad8x8 (cpi->dsp, SrcPtr[2], PixelsPerLine, RefPtr + RefRow2Offset,
PixelsPerLine + STRIDE_EXTRA);
}
if ( MBlockDispFrags[3] ) {
Error += dsp_sad8x8 (cpi->dsp, SrcPtr[3], PixelsPerLine, RefPtr + RefRow2Offset + 8,
PixelsPerLine + STRIDE_EXTRA);
}
/* Set starting values to results of 0, 0 vector. */
MinError = Error;
BestBlockPtr = RefPtr;
x = 0;
y = 0;
MV->x = 0;
MV->y = 0;
/* Proceed through N-steps. */
for ( step=0; step<cpi->MVSearchSteps; step++ ) {
/* Search the 8-neighbours at distance pertinent to current step.*/
for ( i=0; i<8; i++ ) {
/* Set pointer to next candidate matching block. */
CandidateBlockPtr = RefPtr + MVPixelOffset[SearchSite];
/* Reset error */
Error = 0;
/* Get the score for the current offset */
if ( MBlockDispFrags[0] ) {
Error += dsp_sad8x8 (cpi->dsp, SrcPtr[0], PixelsPerLine, CandidateBlockPtr,
PixelsPerLine + STRIDE_EXTRA);
}
if ( MBlockDispFrags[1] && (Error < MinError) ) {
Error += dsp_sad8x8_thres (cpi->dsp, SrcPtr[1], PixelsPerLine, CandidateBlockPtr + 8,
PixelsPerLine + STRIDE_EXTRA, MinError);
}
if ( MBlockDispFrags[2] && (Error < MinError) ) {
Error += dsp_sad8x8_thres (cpi->dsp, SrcPtr[2], PixelsPerLine, CandidateBlockPtr + RefRow2Offset,
PixelsPerLine + STRIDE_EXTRA, MinError);
}
if ( MBlockDispFrags[3] && (Error < MinError) ) {
Error += dsp_sad8x8_thres (cpi->dsp, SrcPtr[3], PixelsPerLine, CandidateBlockPtr + RefRow2Offset + 8,
PixelsPerLine + STRIDE_EXTRA, MinError);
}
if ( Error < MinError ) {
/* Remember best match. */
MinError = Error;
BestBlockPtr = CandidateBlockPtr;
/* Where is it. */
x = MV->x + cpi->MVOffsetX[SearchSite];
y = MV->y + cpi->MVOffsetY[SearchSite];
}
/* Move to next search location. */
SearchSite += 1;
}
/* Move to best location this step. */
RefPtr = BestBlockPtr;
MV->x = x;
MV->y = y;
}
/* Factor vectors to 1/2 pixel resoultion. */
MV->x = (MV->x * 2);
MV->y = (MV->y * 2);
/* Now do the half pixel pass */
BestHalfOffset = 4; /* Default to the no offset case. */
BestHalfPixelError = MinError;
/* Get the half pixel error for each half pixel offset */
for ( i=0; i < 9; i++ ) {
HalfPixelError = 0;
if ( MBlockDispFrags[0] ) {
RefDataPtr1 = BestBlockPtr;
RefDataPtr2 = RefDataPtr1 + cpi->HalfPixelRef2Offset[i];
HalfPixelError =
GetHalfPixelSumAbsDiffs(cpi, SrcPtr[0], RefDataPtr1, RefDataPtr2,
PixelsPerLine, HalfPixelError, BestHalfPixelError );
}
if ( MBlockDispFrags[1] && (HalfPixelError < BestHalfPixelError) ) {
RefDataPtr1 = BestBlockPtr + 8;
RefDataPtr2 = RefDataPtr1 + cpi->HalfPixelRef2Offset[i];
HalfPixelError =
GetHalfPixelSumAbsDiffs(cpi, SrcPtr[1], RefDataPtr1, RefDataPtr2,
PixelsPerLine, HalfPixelError, BestHalfPixelError );
}
if ( MBlockDispFrags[2] && (HalfPixelError < BestHalfPixelError) ) {
RefDataPtr1 = BestBlockPtr + RefRow2Offset;
RefDataPtr2 = RefDataPtr1 + cpi->HalfPixelRef2Offset[i];
HalfPixelError =
GetHalfPixelSumAbsDiffs(cpi, SrcPtr[2], RefDataPtr1, RefDataPtr2,
PixelsPerLine, HalfPixelError, BestHalfPixelError );
}
if ( MBlockDispFrags[3] && (HalfPixelError < BestHalfPixelError) ) {
RefDataPtr1 = BestBlockPtr + RefRow2Offset + 8;
RefDataPtr2 = RefDataPtr1 + cpi->HalfPixelRef2Offset[i];
HalfPixelError =
GetHalfPixelSumAbsDiffs(cpi, SrcPtr[3], RefDataPtr1, RefDataPtr2,
PixelsPerLine, HalfPixelError, BestHalfPixelError );
}
if ( HalfPixelError < BestHalfPixelError ) {
BestHalfOffset = (unsigned char)i;
BestHalfPixelError = HalfPixelError;
}
}
/* Half pixel adjust the MV */
MV->x += cpi->HalfPixelXOffset[BestHalfOffset];
MV->y += cpi->HalfPixelYOffset[BestHalfOffset];
/* Get the error score for the chosen 1/2 pixel offset as a variance. */
InterMVError = GetMBInterError( cpi, cpi->ConvDestBuffer, RefFramePtr,
FragIndex, MV->x, MV->y, PixelsPerLine );
dsp_restore_fpu (cpi->dsp);
/* Return score of best matching block. */
return InterMVError;
}
ogg_uint32_t GetMBMVExhaustiveSearch (CP_INSTANCE *cpi,
unsigned char * RefFramePtr,
ogg_uint32_t FragIndex,
ogg_uint32_t PixelsPerLine,
MOTION_VECTOR *MV ) {
ogg_uint32_t Error = 0;
ogg_uint32_t MinError = HUGE_ERROR;
ogg_uint32_t InterMVError = 0;
ogg_int32_t i, j;
ogg_int32_t x=0, y=0;
unsigned char *SrcPtr[4] = {NULL,NULL,NULL,NULL};
unsigned char *RefPtr;
unsigned char *CandidateBlockPtr=NULL;
unsigned char *BestBlockPtr=NULL;
ogg_uint32_t RefRow2Offset = cpi->pb.YStride * 8;
int MBlockDispFrags[4];
/* Half pixel variables */
ogg_int32_t HalfPixelError;
ogg_int32_t BestHalfPixelError;
unsigned char BestHalfOffset;
unsigned char * RefDataPtr1;
unsigned char * RefDataPtr2;
dsp_save_fpu (cpi->dsp);
/* Note which of the four blocks in the macro block are to be
included in the search. */
MBlockDispFrags[0] = cpi->
pb.display_fragments[FragIndex];
MBlockDispFrags[1] = cpi->
pb.display_fragments[FragIndex + 1];
MBlockDispFrags[2] = cpi->
pb.display_fragments[FragIndex + cpi->pb.HFragments];
MBlockDispFrags[3] = cpi->
pb.display_fragments[FragIndex + cpi->pb.HFragments + 1];
/* Set up the source pointers for the four source blocks. */
SrcPtr[0] = &cpi->
ConvDestBuffer[cpi->pb.pixel_index_table[FragIndex]];
SrcPtr[1] = SrcPtr[0] + 8;
SrcPtr[2] = SrcPtr[0] + (PixelsPerLine * 8);
SrcPtr[3] = SrcPtr[2] + 8;
RefPtr = &RefFramePtr[cpi->pb.recon_pixel_index_table[FragIndex]];
RefPtr = RefPtr - ((MAX_MV_EXTENT/2) * cpi->
pb.YStride) - (MAX_MV_EXTENT/2);
/* Search each pixel alligned site */
for ( i = 0; i < (ogg_int32_t)MAX_MV_EXTENT; i ++ ) {
/* Starting position in row */
CandidateBlockPtr = RefPtr;
for ( j = 0; j < (ogg_int32_t)MAX_MV_EXTENT; j++ ) {
/* Reset error */
Error = 0;
/* Summ errors for each block. */
if ( MBlockDispFrags[0] ) {
Error += dsp_sad8x8 (cpi->dsp, SrcPtr[0], PixelsPerLine, CandidateBlockPtr,
PixelsPerLine + STRIDE_EXTRA);
}
if ( MBlockDispFrags[1] ){
Error += dsp_sad8x8 (cpi->dsp, SrcPtr[1], PixelsPerLine, CandidateBlockPtr + 8,
PixelsPerLine + STRIDE_EXTRA);
}
if ( MBlockDispFrags[2] ){
Error += dsp_sad8x8 (cpi->dsp, SrcPtr[2], PixelsPerLine, CandidateBlockPtr + RefRow2Offset,
PixelsPerLine + STRIDE_EXTRA);
}
if ( MBlockDispFrags[3] ){
Error += dsp_sad8x8 (cpi->dsp, SrcPtr[3], PixelsPerLine, CandidateBlockPtr + RefRow2Offset + 8,
PixelsPerLine + STRIDE_EXTRA);
}
/* Was this the best so far */
if ( Error < MinError ) {
MinError = Error;
BestBlockPtr = CandidateBlockPtr;
x = 16 + j - MAX_MV_EXTENT;
y = 16 + i - MAX_MV_EXTENT;
}
/* Move the the next site */
CandidateBlockPtr ++;
}
/* Move on to the next row. */
RefPtr += cpi->pb.YStride;
}
/* Factor vectors to 1/2 pixel resoultion. */
MV->x = (x * 2);
MV->y = (y * 2);
/* Now do the half pixel pass */
BestHalfOffset = 4; /* Default to the no offset case. */
BestHalfPixelError = MinError;
/* Get the half pixel error for each half pixel offset */
for ( i=0; i < 9; i++ ) {
HalfPixelError = 0;
if ( MBlockDispFrags[0] ) {
RefDataPtr1 = BestBlockPtr;
RefDataPtr2 = RefDataPtr1 + cpi->HalfPixelRef2Offset[i];
HalfPixelError =
GetHalfPixelSumAbsDiffs(cpi, SrcPtr[0], RefDataPtr1, RefDataPtr2,
PixelsPerLine, HalfPixelError, BestHalfPixelError );
}
if ( MBlockDispFrags[1] && (HalfPixelError < BestHalfPixelError) ) {
RefDataPtr1 = BestBlockPtr + 8;
RefDataPtr2 = RefDataPtr1 + cpi->HalfPixelRef2Offset[i];
HalfPixelError =
GetHalfPixelSumAbsDiffs(cpi, SrcPtr[1], RefDataPtr1, RefDataPtr2,
PixelsPerLine, HalfPixelError, BestHalfPixelError );
}
if ( MBlockDispFrags[2] && (HalfPixelError < BestHalfPixelError) ) {
RefDataPtr1 = BestBlockPtr + RefRow2Offset;
RefDataPtr2 = RefDataPtr1 + cpi->HalfPixelRef2Offset[i];
HalfPixelError =
GetHalfPixelSumAbsDiffs(cpi, SrcPtr[2], RefDataPtr1, RefDataPtr2,
PixelsPerLine, HalfPixelError, BestHalfPixelError );
}
if ( MBlockDispFrags[3] && (HalfPixelError < BestHalfPixelError) ) {
RefDataPtr1 = BestBlockPtr + RefRow2Offset + 8;
RefDataPtr2 = RefDataPtr1 + cpi->HalfPixelRef2Offset[i];
HalfPixelError =
GetHalfPixelSumAbsDiffs(cpi, SrcPtr[3], RefDataPtr1, RefDataPtr2,
PixelsPerLine, HalfPixelError, BestHalfPixelError );
}
if ( HalfPixelError < BestHalfPixelError ){
BestHalfOffset = (unsigned char)i;
BestHalfPixelError = HalfPixelError;
}
}
/* Half pixel adjust the MV */
MV->x += cpi->HalfPixelXOffset[BestHalfOffset];
MV->y += cpi->HalfPixelYOffset[BestHalfOffset];
/* Get the error score for the chosen 1/2 pixel offset as a variance. */
InterMVError = GetMBInterError( cpi, cpi->ConvDestBuffer, RefFramePtr,
FragIndex, MV->x, MV->y, PixelsPerLine );
dsp_restore_fpu (cpi->dsp);
/* Return score of best matching block. */
return InterMVError;
}
static ogg_uint32_t GetBMVExhaustiveSearch (CP_INSTANCE *cpi,
unsigned char * RefFramePtr,
ogg_uint32_t FragIndex,
ogg_uint32_t PixelsPerLine,
MOTION_VECTOR *MV ) {
ogg_uint32_t Error = 0;
ogg_uint32_t MinError = HUGE_ERROR;
ogg_uint32_t InterMVError = 0;
ogg_int32_t i, j;
ogg_int32_t x=0, y=0;
unsigned char *SrcPtr = NULL;
unsigned char *RefPtr;
unsigned char *CandidateBlockPtr=NULL;
unsigned char *BestBlockPtr=NULL;
/* Half pixel variables */
ogg_int32_t HalfPixelError;
ogg_int32_t BestHalfPixelError;
unsigned char BestHalfOffset;
unsigned char * RefDataPtr2;
/* Set up the source pointer for the block. */
SrcPtr = &cpi->
ConvDestBuffer[cpi->pb.pixel_index_table[FragIndex]];
RefPtr = &RefFramePtr[cpi->pb.recon_pixel_index_table[FragIndex]];
RefPtr = RefPtr - ((MAX_MV_EXTENT/2) *
cpi->pb.YStride) - (MAX_MV_EXTENT/2);
/* Search each pixel alligned site */
for ( i = 0; i < (ogg_int32_t)MAX_MV_EXTENT; i ++ ) {
/* Starting position in row */
CandidateBlockPtr = RefPtr;
for ( j = 0; j < (ogg_int32_t)MAX_MV_EXTENT; j++ ){
/* Get the block error score. */
Error = dsp_sad8x8 (cpi->dsp, SrcPtr, PixelsPerLine, CandidateBlockPtr,
PixelsPerLine + STRIDE_EXTRA);
/* Was this the best so far */
if ( Error < MinError ) {
MinError = Error;
BestBlockPtr = CandidateBlockPtr;
x = 16 + j - MAX_MV_EXTENT;
y = 16 + i - MAX_MV_EXTENT;
}
/* Move the the next site */
CandidateBlockPtr ++;
}
/* Move on to the next row. */
RefPtr += cpi->pb.YStride;
}
/* Factor vectors to 1/2 pixel resoultion. */
MV->x = (x * 2);
MV->y = (y * 2);
/* Now do the half pixel pass */
BestHalfOffset = 4; /* Default to the no offset case. */
BestHalfPixelError = MinError;
/* Get the half pixel error for each half pixel offset */
for ( i=0; i < 9; i++ ) {
RefDataPtr2 = BestBlockPtr + cpi->HalfPixelRef2Offset[i];
HalfPixelError =
GetHalfPixelSumAbsDiffs(cpi, SrcPtr, BestBlockPtr, RefDataPtr2,
PixelsPerLine, 0, BestHalfPixelError );
if ( HalfPixelError < BestHalfPixelError ){
BestHalfOffset = (unsigned char)i;
BestHalfPixelError = HalfPixelError;
}
}
/* Half pixel adjust the MV */
MV->x += cpi->HalfPixelXOffset[BestHalfOffset];
MV->y += cpi->HalfPixelYOffset[BestHalfOffset];
/* Get the variance score at the chosen offset */
RefDataPtr2 = BestBlockPtr + cpi->HalfPixelRef2Offset[BestHalfOffset];
InterMVError =
GetInterErr(cpi, SrcPtr, BestBlockPtr, RefDataPtr2, PixelsPerLine );
/* Return score of best matching block. */
return InterMVError;
}
ogg_uint32_t GetFOURMVExhaustiveSearch (CP_INSTANCE *cpi,
unsigned char * RefFramePtr,
ogg_uint32_t FragIndex,
ogg_uint32_t PixelsPerLine,
MOTION_VECTOR *MV ) {
ogg_uint32_t InterMVError;
dsp_save_fpu (cpi->dsp);
/* For the moment the 4MV mode is only deemed to be valid
if all four Y blocks are to be updated */
/* This may be adapted later. */
if ( cpi->pb.display_fragments[FragIndex] &&
cpi->pb.display_fragments[FragIndex + 1] &&
cpi->pb.display_fragments[FragIndex + cpi->pb.HFragments] &&
cpi->pb.display_fragments[FragIndex + cpi->pb.HFragments + 1] ) {
/* Reset the error score. */
InterMVError = 0;
/* Get the error component from each coded block */
InterMVError +=
GetBMVExhaustiveSearch(cpi, RefFramePtr, FragIndex,
PixelsPerLine, &(MV[0]) );
InterMVError +=
GetBMVExhaustiveSearch(cpi, RefFramePtr, (FragIndex + 1),
PixelsPerLine, &(MV[1]) );
InterMVError +=
GetBMVExhaustiveSearch(cpi, RefFramePtr,
(FragIndex + cpi->pb.HFragments),
PixelsPerLine, &(MV[2]) );
InterMVError +=
GetBMVExhaustiveSearch(cpi, RefFramePtr,
(FragIndex + cpi->pb.HFragments + 1),
PixelsPerLine, &(MV[3]) );
}else{
InterMVError = HUGE_ERROR;
}
dsp_restore_fpu (cpi->dsp);
/* Return score of best matching block. */
return InterMVError;
}

View file

@ -1,339 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: misc_common.c 15323 2008-09-19 19:43:59Z giles $
********************************************************************/
#include <string.h>
#include "codec_internal.h"
#include "block_inline.h"
#define FIXED_Q 150
#define MAX_UP_REG_LOOPS 2
/* Gives the initial bytes per block estimate for each Q value */
static const double BpbTable[Q_TABLE_SIZE] = {
0.42, 0.45, 0.46, 0.49, 0.51, 0.53, 0.56, 0.58,
0.61, 0.64, 0.68, 0.71, 0.74, 0.77, 0.80, 0.84,
0.89, 0.92, 0.98, 1.01, 1.04, 1.13, 1.17, 1.23,
1.28, 1.34, 1.41, 1.45, 1.51, 1.59, 1.69, 1.80,
1.84, 1.94, 2.02, 2.15, 2.23, 2.34, 2.44, 2.50,
2.69, 2.80, 2.87, 3.04, 3.16, 3.29, 3.59, 3.66,
3.86, 3.94, 4.22, 4.50, 4.64, 4.70, 5.24, 5.34,
5.61, 5.87, 6.11, 6.41, 6.71, 6.99, 7.36, 7.69
};
static const double KfBpbTable[Q_TABLE_SIZE] = {
0.74, 0.81, 0.88, 0.94, 1.00, 1.06, 1.14, 1.19,
1.27, 1.34, 1.42, 1.49, 1.54, 1.59, 1.66, 1.73,
1.80, 1.87, 1.97, 2.01, 2.08, 2.21, 2.25, 2.36,
2.39, 2.50, 2.55, 2.65, 2.71, 2.82, 2.95, 3.01,
3.11, 3.19, 3.31, 3.42, 3.58, 3.66, 3.78, 3.89,
4.11, 4.26, 4.36, 4.39, 4.63, 4.76, 4.85, 5.04,
5.26, 5.29, 5.47, 5.64, 5.76, 6.05, 6.35, 6.67,
6.91, 7.17, 7.40, 7.56, 8.02, 8.45, 8.86, 9.38
};
double GetEstimatedBpb( CP_INSTANCE *cpi, ogg_uint32_t TargetQ ){
ogg_uint32_t i;
ogg_int32_t ThreshTableIndex = Q_TABLE_SIZE - 1;
double BytesPerBlock;
/* Search for the Q table index that matches the given Q. */
for ( i = 0; i < Q_TABLE_SIZE; i++ ) {
if ( TargetQ >= cpi->pb.QThreshTable[i] ) {
ThreshTableIndex = i;
break;
}
}
/* Adjust according to Q shift and type of frame */
if ( cpi->pb.FrameType == KEY_FRAME ) {
/* Get primary prediction */
BytesPerBlock = KfBpbTable[ThreshTableIndex];
} else {
/* Get primary prediction */
BytesPerBlock = BpbTable[ThreshTableIndex];
BytesPerBlock = BytesPerBlock * cpi->BpbCorrectionFactor;
}
return BytesPerBlock;
}
static void UpRegulateMB( CP_INSTANCE *cpi, ogg_uint32_t RegulationQ,
ogg_uint32_t SB, ogg_uint32_t MB, int NoCheck ) {
ogg_int32_t FragIndex;
ogg_uint32_t B;
/* Variables used in calculating corresponding row,col and index in
UV planes */
ogg_uint32_t UVRow;
ogg_uint32_t UVColumn;
ogg_uint32_t UVFragOffset;
/* There may be MB's lying out of frame which must be ignored. For
these MB's Top left block will have a negative Fragment Index. */
if ( QuadMapToMBTopLeft(cpi->pb.BlockMap, SB, MB ) >= 0 ) {
/* Up regulate the component blocks Y then UV. */
for ( B=0; B<4; B++ ){
FragIndex = QuadMapToIndex1( cpi->pb.BlockMap, SB, MB, B );
if ( ( !cpi->pb.display_fragments[FragIndex] ) &&
( (NoCheck) || (cpi->FragmentLastQ[FragIndex] > RegulationQ) ) ){
cpi->pb.display_fragments[FragIndex] = 1;
cpi->extra_fragments[FragIndex] = 1;
cpi->FragmentLastQ[FragIndex] = RegulationQ;
cpi->MotionScore++;
}
}
/* Check the two UV blocks */
FragIndex = QuadMapToMBTopLeft(cpi->pb.BlockMap, SB, MB );
UVRow = (FragIndex / (cpi->pb.HFragments * 2));
UVColumn = (FragIndex % cpi->pb.HFragments) / 2;
UVFragOffset = (UVRow * (cpi->pb.HFragments / 2)) + UVColumn;
FragIndex = cpi->pb.YPlaneFragments + UVFragOffset;
if ( ( !cpi->pb.display_fragments[FragIndex] ) &&
( (NoCheck) || (cpi->FragmentLastQ[FragIndex] > RegulationQ) ) ) {
cpi->pb.display_fragments[FragIndex] = 1;
cpi->extra_fragments[FragIndex] = 1;
cpi->FragmentLastQ[FragIndex] = RegulationQ;
cpi->MotionScore++;
}
FragIndex += cpi->pb.UVPlaneFragments;
if ( ( !cpi->pb.display_fragments[FragIndex] ) &&
( (NoCheck) || (cpi->FragmentLastQ[FragIndex] > RegulationQ) ) ) {
cpi->pb.display_fragments[FragIndex] = 1;
cpi->extra_fragments[FragIndex] = 1;
cpi->FragmentLastQ[FragIndex] = RegulationQ;
cpi->MotionScore++;
}
}
}
static void UpRegulateBlocks (CP_INSTANCE *cpi, ogg_uint32_t RegulationQ,
ogg_int32_t RecoveryBlocks,
ogg_uint32_t * LastSB, ogg_uint32_t * LastMB ) {
ogg_uint32_t LoopTimesRound = 0;
ogg_uint32_t MaxSB = cpi->pb.YSBRows *
cpi->pb.YSBCols; /* Tot super blocks in image */
ogg_uint32_t SB, MB; /* Super-Block and macro block indices. */
/* First scan for blocks for which a residue update is outstanding. */
while ( (cpi->MotionScore < RecoveryBlocks) &&
(LoopTimesRound < MAX_UP_REG_LOOPS) ) {
LoopTimesRound++;
for ( SB = (*LastSB); SB < MaxSB; SB++ ) {
/* Check its four Macro-Blocks */
for ( MB=(*LastMB); MB<4; MB++ ) {
/* Mark relevant blocks for update */
UpRegulateMB( cpi, RegulationQ, SB, MB, 0 );
/* Keep track of the last refresh MB. */
(*LastMB) += 1;
if ( (*LastMB) == 4 )
(*LastMB) = 0;
/* Termination clause */
if (cpi->MotionScore >= RecoveryBlocks) {
/* Make sure we don't stall at SB level */
if ( *LastMB == 0 )
SB++;
break;
}
}
/* Termination clause */
if (cpi->MotionScore >= RecoveryBlocks)
break;
}
/* Update super block start index */
if ( SB >= MaxSB){
(*LastSB) = 0;
}else{
(*LastSB) = SB;
}
}
}
void UpRegulateDataStream (CP_INSTANCE *cpi, ogg_uint32_t RegulationQ,
ogg_int32_t RecoveryBlocks ) {
ogg_uint32_t LastPassMBPos = 0;
ogg_uint32_t StdLastMBPos = 0;
ogg_uint32_t MaxSB = cpi->pb.YSBRows *
cpi->pb.YSBCols; /* Tot super blocks in image */
ogg_uint32_t SB=0; /* Super-Block index */
ogg_uint32_t MB; /* Macro-Block index */
/* Decduct the number of blocks in an MB / 2 from the recover block count.
This will compensate for the fact that once we start checking an MB
we test every block in that macro block */
if ( RecoveryBlocks > 3 )
RecoveryBlocks -= 3;
/* Up regulate blocks last coded at higher Q */
UpRegulateBlocks( cpi, RegulationQ, RecoveryBlocks,
&cpi->LastEndSB, &StdLastMBPos );
/* If we have still not used up the minimum number of blocks and are
at the minimum Q then run through a final pass of the data to
insure that each block gets a final refresh. */
if ( (RegulationQ == VERY_BEST_Q) &&
(cpi->MotionScore < RecoveryBlocks) ) {
if ( cpi->FinalPassLastPos < MaxSB ) {
for ( SB = cpi->FinalPassLastPos; SB < MaxSB; SB++ ) {
/* Check its four Macro-Blocks */
for ( MB=LastPassMBPos; MB<4; MB++ ) {
/* Mark relevant blocks for update */
UpRegulateMB( cpi, RegulationQ, SB, MB, 1 );
/* Keep track of the last refresh MB. */
LastPassMBPos += 1;
if ( LastPassMBPos == 4 ) {
LastPassMBPos = 0;
/* Increment SB index */
cpi->FinalPassLastPos += 1;
}
/* Termination clause */
if (cpi->MotionScore >= RecoveryBlocks)
break;
}
/* Termination clause */
if (cpi->MotionScore >= RecoveryBlocks)
break;
}
}
}
}
void RegulateQ( CP_INSTANCE *cpi, ogg_int32_t UpdateScore ) {
double PredUnitScoreBytes;
ogg_uint32_t QIndex = Q_TABLE_SIZE - 1;
ogg_uint32_t i;
if ( UpdateScore > 0 ) {
double TargetUnitScoreBytes = (double)cpi->ThisFrameTargetBytes /
(double)UpdateScore;
double LastBitError = 10000.0; /* Silly high number */
/* Search for the best Q for the target bitrate. */
for ( i = 0; i < Q_TABLE_SIZE; i++ ) {
PredUnitScoreBytes = GetEstimatedBpb( cpi, cpi->pb.QThreshTable[i] );
if ( PredUnitScoreBytes > TargetUnitScoreBytes ) {
if ( (PredUnitScoreBytes - TargetUnitScoreBytes) <= LastBitError ) {
QIndex = i;
} else {
QIndex = i - 1;
}
break;
} else {
LastBitError = TargetUnitScoreBytes - PredUnitScoreBytes;
}
}
}
/* QIndex should now indicate the optimal Q. */
cpi->pb.ThisFrameQualityValue = cpi->pb.QThreshTable[QIndex];
/* Apply range restrictions for key frames. */
if ( cpi->pb.FrameType == KEY_FRAME ) {
if ( cpi->pb.ThisFrameQualityValue > cpi->pb.QThreshTable[20] )
cpi->pb.ThisFrameQualityValue = cpi->pb.QThreshTable[20];
else if ( cpi->pb.ThisFrameQualityValue < cpi->pb.QThreshTable[50] )
cpi->pb.ThisFrameQualityValue = cpi->pb.QThreshTable[50];
}
/* Limit the Q value to the maximum available value */
if (cpi->pb.ThisFrameQualityValue >
cpi->pb.QThreshTable[cpi->Configuration.ActiveMaxQ]) {
cpi->pb.ThisFrameQualityValue =
(ogg_uint32_t)cpi->pb.QThreshTable[cpi->Configuration.ActiveMaxQ];
}
if(cpi->FixedQ) {
if ( cpi->pb.FrameType == KEY_FRAME ) {
cpi->pb.ThisFrameQualityValue = cpi->pb.QThreshTable[43];
cpi->pb.ThisFrameQualityValue = cpi->FixedQ;
} else {
cpi->pb.ThisFrameQualityValue = cpi->FixedQ;
}
}
/* If the quantizer value has changed then re-initialise it */
if ( cpi->pb.ThisFrameQualityValue != cpi->pb.LastFrameQualityValue ) {
/* Initialise quality tables. */
UpdateQC( cpi, cpi->pb.ThisFrameQualityValue );
cpi->pb.LastFrameQualityValue = cpi->pb.ThisFrameQualityValue;
}
}
void CopyBackExtraFrags(CP_INSTANCE *cpi){
ogg_uint32_t i,j;
unsigned char * SrcPtr;
unsigned char * DestPtr;
ogg_uint32_t PlaneLineStep;
ogg_uint32_t PixelIndex;
/* Copy back for Y plane. */
PlaneLineStep = cpi->pb.info.width;
for ( i = 0; i < cpi->pb.YPlaneFragments; i++ ) {
/* We are only interested in updated fragments. */
if ( cpi->extra_fragments[i] ) {
/* Get the start index for the fragment. */
PixelIndex = cpi->pb.pixel_index_table[i];
SrcPtr = &cpi->yuv1ptr[PixelIndex];
DestPtr = &cpi->ConvDestBuffer[PixelIndex];
for ( j = 0; j < VFRAGPIXELS; j++ ) {
memcpy( DestPtr, SrcPtr, HFRAGPIXELS);
SrcPtr += PlaneLineStep;
DestPtr += PlaneLineStep;
}
}
}
/* Now the U and V planes */
PlaneLineStep = cpi->pb.info.width / 2;
for ( i = cpi->pb.YPlaneFragments;
i < (cpi->pb.YPlaneFragments + (2 * cpi->pb.UVPlaneFragments)) ;
i++ ) {
/* We are only interested in updated fragments. */
if ( cpi->extra_fragments[i] ) {
/* Get the start index for the fragment. */
PixelIndex = cpi->pb.pixel_index_table[i];
SrcPtr = &cpi->yuv1ptr[PixelIndex];
DestPtr = &cpi->ConvDestBuffer[PixelIndex];
for ( j = 0; j < VFRAGPIXELS; j++ ) {
memcpy( DestPtr, SrcPtr, HFRAGPIXELS);
SrcPtr += PlaneLineStep;
DestPtr += PlaneLineStep;
}
}
}
}

View file

@ -1,89 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: pb.c 14372 2008-01-05 23:52:28Z giles $
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include "codec_internal.h"
void ClearTmpBuffers(PB_INSTANCE * pbi){
if(pbi->ReconDataBuffer)
_ogg_free(pbi->ReconDataBuffer);
if(pbi->DequantBuffer)
_ogg_free(pbi->DequantBuffer);
if(pbi->TmpDataBuffer)
_ogg_free(pbi->TmpDataBuffer);
if(pbi->TmpReconBuffer)
_ogg_free(pbi->TmpReconBuffer);
pbi->ReconDataBuffer=0;
pbi->DequantBuffer = 0;
pbi->TmpDataBuffer = 0;
pbi->TmpReconBuffer = 0;
}
void InitTmpBuffers(PB_INSTANCE * pbi){
/* clear any existing info */
ClearTmpBuffers(pbi);
/* Adjust the position of all of our temporary */
pbi->ReconDataBuffer =
_ogg_malloc(64*sizeof(*pbi->ReconDataBuffer));
pbi->DequantBuffer =
_ogg_malloc(64 * sizeof(*pbi->DequantBuffer));
pbi->TmpDataBuffer =
_ogg_malloc(64 * sizeof(*pbi->TmpDataBuffer));
pbi->TmpReconBuffer =
_ogg_malloc(64 * sizeof(*pbi->TmpReconBuffer));
}
void ClearPBInstance(PB_INSTANCE *pbi){
if(pbi){
ClearTmpBuffers(pbi);
if (pbi->opb) {
_ogg_free(pbi->opb);
}
}
}
void InitPBInstance(PB_INSTANCE *pbi){
/* initialize whole structure to 0 */
memset(pbi, 0, sizeof(*pbi));
InitTmpBuffers(pbi);
/* allocate memory for the oggpack_buffer */
pbi->opb = _ogg_malloc(sizeof(oggpack_buffer));
/* variables needing initialization (not being set to 0) */
pbi->ModifierPointer[0] = &pbi->Modifier[0][255];
pbi->ModifierPointer[1] = &pbi->Modifier[1][255];
pbi->ModifierPointer[2] = &pbi->Modifier[2][255];
pbi->ModifierPointer[3] = &pbi->Modifier[3][255];
pbi->DecoderErrorCode = 0;
pbi->KeyFrameType = DCT_KEY_FRAME;
pbi->FramesHaveBeenSkipped = 0;
}

View file

@ -1,951 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: pp.c 15057 2008-06-22 21:07:32Z xiphmont $
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include "codec_internal.h"
#include "pp.h"
#include "dsp.h"
#define MAX(a, b) ((a>b)?a:b)
#define MIN(a, b) ((a<b)?a:b)
#define PP_QUALITY_THRESH 49
static const ogg_int32_t SharpenModifier[ Q_TABLE_SIZE ] =
{ -12, -11, -10, -10, -9, -9, -9, -9,
-6, -6, -6, -6, -6, -6, -6, -6,
-4, -4, -4, -4, -4, -4, -4, -4,
-2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, -2, -2, -2,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
};
static const ogg_uint32_t DcQuantScaleV1[ Q_TABLE_SIZE ] = {
22, 20, 19, 18, 17, 17, 16, 16,
15, 15, 14, 14, 13, 13, 12, 12,
11, 11, 10, 10, 9, 9, 9, 8,
8, 8, 7, 7, 7, 6, 6, 6,
6, 5, 5, 5, 5, 4, 4, 4,
4, 4, 3, 3, 3, 3, 3, 3,
3, 2, 2, 2, 2, 2, 2, 2,
2, 1, 1, 1, 1, 1, 1, 1
};
static const ogg_uint32_t * const DeringModifierV1=DcQuantScaleV1;
static void PClearFrameInfo(PP_INSTANCE * ppi){
int i;
if(ppi->ScanPixelIndexTable) _ogg_free(ppi->ScanPixelIndexTable);
ppi->ScanPixelIndexTable=0;
if(ppi->ScanDisplayFragments) _ogg_free(ppi->ScanDisplayFragments);
ppi->ScanDisplayFragments=0;
for(i = 0 ; i < MAX_PREV_FRAMES ; i ++)
if(ppi->PrevFragments[i]){
_ogg_free(ppi->PrevFragments[i]);
ppi->PrevFragments[i]=0;
}
if(ppi->FragScores) _ogg_free(ppi->FragScores);
ppi->FragScores=0;
if(ppi->SameGreyDirPixels) _ogg_free(ppi->SameGreyDirPixels);
ppi->SameGreyDirPixels=0;
if(ppi->FragDiffPixels) _ogg_free(ppi->FragDiffPixels);
ppi->FragDiffPixels=0;
if(ppi->BarBlockMap) _ogg_free(ppi->BarBlockMap);
ppi->BarBlockMap=0;
if(ppi->TmpCodedMap) _ogg_free(ppi->TmpCodedMap);
ppi->TmpCodedMap=0;
if(ppi->RowChangedPixels) _ogg_free(ppi->RowChangedPixels);
ppi->RowChangedPixels=0;
if(ppi->PixelScores) _ogg_free(ppi->PixelScores);
ppi->PixelScores=0;
if(ppi->PixelChangedMap) _ogg_free(ppi->PixelChangedMap);
ppi->PixelChangedMap=0;
if(ppi->ChLocals) _ogg_free(ppi->ChLocals);
ppi->ChLocals=0;
if(ppi->yuv_differences) _ogg_free(ppi->yuv_differences);
ppi->yuv_differences=0;
}
void PInitFrameInfo(PP_INSTANCE * ppi){
int i;
PClearFrameInfo(ppi);
ppi->ScanPixelIndexTable =
_ogg_malloc(ppi->ScanFrameFragments*sizeof(*ppi->ScanPixelIndexTable));
ppi->ScanDisplayFragments =
_ogg_malloc(ppi->ScanFrameFragments*sizeof(*ppi->ScanDisplayFragments));
for(i = 0 ; i < MAX_PREV_FRAMES ; i ++)
ppi->PrevFragments[i] =
_ogg_malloc(ppi->ScanFrameFragments*sizeof(*ppi->PrevFragments));
ppi->FragScores =
_ogg_malloc(ppi->ScanFrameFragments*sizeof(*ppi->FragScores));
ppi->SameGreyDirPixels =
_ogg_malloc(ppi->ScanFrameFragments*sizeof(*ppi->SameGreyDirPixels));
ppi->FragDiffPixels =
_ogg_malloc(ppi->ScanFrameFragments*sizeof(*ppi->FragScores));
ppi->BarBlockMap=
_ogg_malloc(3 * ppi->ScanHFragments*sizeof(*ppi->BarBlockMap));
ppi->TmpCodedMap =
_ogg_malloc(ppi->ScanHFragments*sizeof(*ppi->TmpCodedMap));
ppi->RowChangedPixels =
_ogg_malloc(3 * ppi->ScanConfig.VideoFrameHeight*
sizeof(*ppi->RowChangedPixels));
ppi->PixelScores =
_ogg_malloc(ppi->ScanConfig.VideoFrameWidth*
sizeof(*ppi->PixelScores) * PSCORE_CB_ROWS);
ppi->PixelChangedMap =
_ogg_malloc(ppi->ScanConfig.VideoFrameWidth*
sizeof(*ppi->PixelChangedMap) * PMAP_CB_ROWS);
ppi->ChLocals =
_ogg_malloc(ppi->ScanConfig.VideoFrameWidth*
sizeof(*ppi->ChLocals) * CHLOCALS_CB_ROWS);
ppi->yuv_differences =
_ogg_malloc(ppi->ScanConfig.VideoFrameWidth*
sizeof(*ppi->yuv_differences) * YDIFF_CB_ROWS);
}
void ClearPPInstance(PP_INSTANCE *ppi){
PClearFrameInfo(ppi);
}
void InitPPInstance(PP_INSTANCE *ppi, DspFunctions *funcs){
memset(ppi,0,sizeof(*ppi));
memcpy(&ppi->dsp, funcs, sizeof(DspFunctions));
/* Initializations */
ppi->PrevFrameLimit = 3; /* Must not exceed MAX_PREV_FRAMES (Note
that this number includes the current
frame so "1 = no effect") */
/* Scan control variables. */
ppi->HFragPixels = 8;
ppi->VFragPixels = 8;
ppi->SRFGreyThresh = 4;
ppi->SRFColThresh = 5;
ppi->NoiseSupLevel = 3;
ppi->SgcLevelThresh = 3;
ppi->SuvcLevelThresh = 4;
/* Variables controlling S.A.D. breakouts. */
ppi->GrpLowSadThresh = 10;
ppi->GrpHighSadThresh = 64;
ppi->PrimaryBlockThreshold = 5;
ppi->SgcThresh = 16; /* (Default values for 8x8 blocks). */
ppi->UVBlockThreshCorrection = 1.25;
ppi->UVSgcCorrection = 1.5;
ppi->MaxLineSearchLen = MAX_SEARCH_LINE_LEN;
}
static void DeringBlockStrong(unsigned char *SrcPtr,
unsigned char *DstPtr,
ogg_int32_t Pitch,
ogg_uint32_t FragQIndex,
const ogg_uint32_t *QuantScale){
ogg_int16_t UDMod[72];
ogg_int16_t LRMod[72];
unsigned int j,k,l;
const unsigned char * Src;
unsigned int QValue = QuantScale[FragQIndex];
unsigned char p;
unsigned char pl;
unsigned char pr;
unsigned char pu;
unsigned char pd;
int al;
int ar;
int au;
int ad;
int atot;
int B;
int newVal;
const unsigned char *curRow = SrcPtr - 1; /* avoid negative array indexes */
unsigned char *dstRow = DstPtr;
const unsigned char *lastRow = SrcPtr-Pitch;
const unsigned char *nextRow = SrcPtr+Pitch;
unsigned int rowOffset = 0;
unsigned int round = (1<<6);
int High;
int Low;
int TmpMod;
int Sharpen = SharpenModifier[FragQIndex];
High = 3 * QValue;
if(High>32)High=32;
Low = 0;
/* Initialize the Mod Data */
Src = SrcPtr-Pitch;
for(k=0;k<9;k++){
for(j=0;j<8;j++){
TmpMod = 32 + QValue - (abs(Src[j+Pitch]-Src[j]));
if(TmpMod< -64)
TmpMod = Sharpen;
else if(TmpMod<Low)
TmpMod = Low;
else if(TmpMod>High)
TmpMod = High;
UDMod[k*8+j] = (ogg_int16_t)TmpMod;
}
Src +=Pitch;
}
Src = SrcPtr-1;
for(k=0;k<8;k++){
for(j=0;j<9;j++){
TmpMod = 32 + QValue - (abs(Src[j+1]-Src[j]));
if(TmpMod< -64 )
TmpMod = Sharpen;
else if(TmpMod<0)
TmpMod = Low;
else if(TmpMod>High)
TmpMod = High;
LRMod[k*9+j] = (ogg_int16_t)TmpMod;
}
Src+=Pitch;
}
for(k=0;k<8;k++){
/* In the case that this function called with same buffer for
source and destination, To keep the c and the mmx version to have
consistant results, intermediate buffer is used to store the
eight pixel value before writing them to destination
(i.e. Overwriting souce for the speical case) */
for(l=0;l<8;l++){
atot = 128;
B = round;
p = curRow[ rowOffset +l +1];
pl = curRow[ rowOffset +l];
al = LRMod[k*9+l];
atot -= al;
B += al * pl;
pu = lastRow[ rowOffset +l];
au = UDMod[k*8+l];
atot -= au;
B += au * pu;
pd = nextRow[ rowOffset +l];
ad = UDMod[(k+1)*8+l];
atot -= ad;
B += ad * pd;
pr = curRow[ rowOffset +l+2];
ar = LRMod[k*9+l+1];
atot -= ar;
B += ar * pr;
newVal = ( atot * p + B) >> 7;
dstRow[ rowOffset +l]= clamp255( newVal );
}
rowOffset += Pitch;
}
}
static void DeringBlockWeak(unsigned char *SrcPtr,
unsigned char *DstPtr,
ogg_int32_t Pitch,
ogg_uint32_t FragQIndex,
const ogg_uint32_t *QuantScale){
ogg_int16_t UDMod[72];
ogg_int16_t LRMod[72];
unsigned int j,k;
const unsigned char * Src;
unsigned int QValue = QuantScale[FragQIndex];
unsigned char p;
unsigned char pl;
unsigned char pr;
unsigned char pu;
unsigned char pd;
int al;
int ar;
int au;
int ad;
int atot;
int B;
int newVal;
const unsigned char *curRow = SrcPtr-1;
unsigned char *dstRow = DstPtr;
const unsigned char *lastRow = SrcPtr-Pitch;
const unsigned char *nextRow = SrcPtr+Pitch;
unsigned int rowOffset = 0;
unsigned int round = (1<<6);
int High;
int Low;
int TmpMod;
int Sharpen = SharpenModifier[FragQIndex];
High = 3 * QValue;
if(High>24)
High=24;
Low = 0 ;
/* Initialize the Mod Data */
Src=SrcPtr-Pitch;
for(k=0;k<9;k++) {
for(j=0;j<8;j++) {
TmpMod = 32 + QValue - 2*(abs(Src[j+Pitch]-Src[j]));
if(TmpMod< -64)
TmpMod = Sharpen;
else if(TmpMod<Low)
TmpMod = Low;
else if(TmpMod>High)
TmpMod = High;
UDMod[k*8+j] = (ogg_int16_t)TmpMod;
}
Src +=Pitch;
}
Src = SrcPtr-1;
for(k=0;k<8;k++){
for(j=0;j<9;j++){
TmpMod = 32 + QValue - 2*(abs(Src[j+1]-Src[j]));
if(TmpMod< -64 )
TmpMod = Sharpen;
else if(TmpMod<Low)
TmpMod = Low;
else if(TmpMod>High)
TmpMod = High;
LRMod[k*9+j] = (ogg_int16_t)TmpMod;
}
Src+=Pitch;
}
for(k=0;k<8;k++) {
for(j=0;j<8;j++){
atot = 128;
B = round;
p = curRow[ rowOffset +j+1];
pl = curRow[ rowOffset +j];
al = LRMod[k*9+j];
atot -= al;
B += al * pl;
pu = lastRow[ rowOffset +j];
au = UDMod[k*8+j];
atot -= au;
B += au * pu;
pd = nextRow[ rowOffset +j];
ad = UDMod[(k+1)*8+j];
atot -= ad;
B += ad * pd;
pr = curRow[ rowOffset +j+2];
ar = LRMod[k*9+j+1];
atot -= ar;
B += ar * pr;
newVal = ( atot * p + B) >> 7;
dstRow[ rowOffset +j] = clamp255( newVal );
}
rowOffset += Pitch;
}
}
static void DeringFrame(PB_INSTANCE *pbi,
unsigned char *Src, unsigned char *Dst){
ogg_uint32_t col,row;
unsigned char *SrcPtr;
unsigned char *DestPtr;
ogg_uint32_t BlocksAcross,BlocksDown;
const ogg_uint32_t *QuantScale;
ogg_uint32_t Block;
ogg_uint32_t LineLength;
ogg_int32_t Thresh1,Thresh2,Thresh3,Thresh4;
Thresh1 = 384;
Thresh2 = 4 * Thresh1;
Thresh3 = 5 * Thresh2/4;
Thresh4 = 5 * Thresh2/2;
QuantScale = DeringModifierV1;
BlocksAcross = pbi->HFragments;
BlocksDown = pbi->VFragments;
SrcPtr = Src + pbi->ReconYDataOffset;
DestPtr = Dst + pbi->ReconYDataOffset;
LineLength = pbi->YStride;
Block = 0;
for ( row = 0 ; row < BlocksDown; row ++){
for (col = 0; col < BlocksAcross; col ++){
ogg_uint32_t Quality = pbi->FragQIndex[Block];
ogg_int32_t Variance = pbi->FragmentVariances[Block];
if( pbi->PostProcessingLevel >5 && Variance > Thresh3 ){
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
if( (col > 0 &&
pbi->FragmentVariances[Block-1] > Thresh4 ) ||
(col + 1 < BlocksAcross &&
pbi->FragmentVariances[Block+1] > Thresh4 ) ||
(row + 1 < BlocksDown &&
pbi->FragmentVariances[Block+BlocksAcross] > Thresh4) ||
(row > 0 &&
pbi->FragmentVariances[Block-BlocksAcross] > Thresh4) ){
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
}
} else if(Variance > Thresh2 ) {
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
} else if(Variance > Thresh1 ) {
DeringBlockWeak(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
} else {
dsp_copy8x8(pbi->dsp, SrcPtr + 8 * col, DestPtr + 8 * col, LineLength);
}
++Block;
}
SrcPtr += 8 * LineLength;
DestPtr += 8 * LineLength;
}
/* Then U */
BlocksAcross /= 2;
BlocksDown /= 2;
LineLength /= 2;
SrcPtr = Src + pbi->ReconUDataOffset;
DestPtr = Dst + pbi->ReconUDataOffset;
for ( row = 0 ; row < BlocksDown; row ++) {
for (col = 0; col < BlocksAcross; col ++) {
ogg_uint32_t Quality = pbi->FragQIndex[Block];
ogg_int32_t Variance = pbi->FragmentVariances[Block];
if( pbi->PostProcessingLevel >5 && Variance > Thresh4 ) {
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
}else if(Variance > Thresh2 ){
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
}else if(Variance > Thresh1 ){
DeringBlockWeak(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
}else{
dsp_copy8x8(pbi->dsp, SrcPtr + 8 * col, DestPtr + 8 * col, LineLength);
}
++Block;
}
SrcPtr += 8 * LineLength;
DestPtr += 8 * LineLength;
}
/* Then V */
SrcPtr = Src + pbi->ReconVDataOffset;
DestPtr = Dst + pbi->ReconVDataOffset;
for ( row = 0 ; row < BlocksDown; row ++){
for (col = 0; col < BlocksAcross; col ++){
ogg_uint32_t Quality = pbi->FragQIndex[Block];
ogg_int32_t Variance = pbi->FragmentVariances[Block];
if( pbi->PostProcessingLevel >5 && Variance > Thresh4 ) {
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
}else if(Variance > Thresh2 ){
DeringBlockStrong(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
}else if(Variance > Thresh1 ){
DeringBlockWeak(SrcPtr + 8 * col, DestPtr + 8 * col,
LineLength,Quality,QuantScale);
}else{
dsp_copy8x8(pbi->dsp, SrcPtr + 8 * col, DestPtr + 8 * col, LineLength);
}
++Block;
}
SrcPtr += 8 * LineLength;
DestPtr += 8 * LineLength;
}
}
void UpdateFragQIndex(PB_INSTANCE *pbi){
ogg_uint32_t ThisFrameQIndex;
ogg_uint32_t i;
/* Check this frame quality index */
ThisFrameQIndex = pbi->FrameQIndex;
/* It is not a key frame, so only reset those are coded */
for( i = 0; i < pbi->UnitFragments; i++ )
if( pbi->display_fragments[i])
pbi->FragQIndex[i] = ThisFrameQIndex;
}
static void DeblockLoopFilteredBand(PB_INSTANCE *pbi,
unsigned char *SrcPtr,
unsigned char *DesPtr,
ogg_uint32_t PlaneLineStep,
ogg_uint32_t FragsAcross,
ogg_uint32_t StartFrag,
const ogg_uint32_t *QuantScale){
ogg_uint32_t j,k;
ogg_uint32_t CurrentFrag=StartFrag;
ogg_int32_t QStep;
ogg_int32_t FLimit;
unsigned char *Src, *Des;
ogg_int32_t x[10];
ogg_int32_t Sum1, Sum2;
while(CurrentFrag < StartFrag + FragsAcross){
Src=SrcPtr+8*(CurrentFrag-StartFrag)-PlaneLineStep*5;
Des=DesPtr+8*(CurrentFrag-StartFrag)-PlaneLineStep*4;
QStep = QuantScale[pbi->FragQIndex[CurrentFrag+FragsAcross]];
FLimit = ( QStep * 3 ) >> 2;
for( j=0; j<8 ; j++){
x[0] = Src[0];
x[1] = Src[PlaneLineStep];
x[2] = Src[PlaneLineStep*2];
x[3] = Src[PlaneLineStep*3];
x[4] = Src[PlaneLineStep*4];
x[5] = Src[PlaneLineStep*5];
x[6] = Src[PlaneLineStep*6];
x[7] = Src[PlaneLineStep*7];
x[8] = Src[PlaneLineStep*8];
x[9] = Src[PlaneLineStep*9];
Sum1=Sum2=0;
for(k=1;k<=4;k++){
Sum1 += abs(x[k]-x[k-1]);
Sum2 += abs(x[k+4]-x[k+5]);
}
pbi->FragmentVariances[CurrentFrag] +=((Sum1>255)?255:Sum1);
pbi->FragmentVariances[CurrentFrag + FragsAcross] += ((Sum2>255)?255:Sum2);
if( Sum1 < FLimit &&
Sum2 < FLimit &&
(x[5] - x[4]) < QStep &&
(x[4] - x[5]) < QStep ){
/* low pass filtering (LPF7: 1 1 1 2 1 1 1) */
Des[0 ] = (x[0] + x[0] +x[0] + x[1] * 2 +
x[2] + x[3] +x[4] + 4) >> 3;
Des[PlaneLineStep ] = (x[0] + x[0] +x[1] + x[2] * 2 +
x[3] + x[4] +x[5] + 4) >> 3;
Des[PlaneLineStep*2] = (x[0] + x[1] +x[2] + x[3] * 2 +
x[4] + x[5] +x[6] + 4) >> 3;
Des[PlaneLineStep*3] = (x[1] + x[2] +x[3] + x[4] * 2 +
x[5] + x[6] +x[7] + 4) >> 3;
Des[PlaneLineStep*4] = (x[2] + x[3] +x[4] + x[5] * 2 +
x[6] + x[7] +x[8] + 4) >> 3;
Des[PlaneLineStep*5] = (x[3] + x[4] +x[5] + x[6] * 2 +
x[7] + x[8] +x[9] + 4) >> 3;
Des[PlaneLineStep*6] = (x[4] + x[5] +x[6] + x[7] * 2 +
x[8] + x[9] +x[9] + 4) >> 3;
Des[PlaneLineStep*7] = (x[5] + x[6] +x[7] + x[8] * 2 +
x[9] + x[9] +x[9] + 4) >> 3;
}else {
/* copy the pixels to destination */
Des[0 ]= (unsigned char)x[1];
Des[PlaneLineStep ]= (unsigned char)x[2];
Des[PlaneLineStep*2]= (unsigned char)x[3];
Des[PlaneLineStep*3]= (unsigned char)x[4];
Des[PlaneLineStep*4]= (unsigned char)x[5];
Des[PlaneLineStep*5]= (unsigned char)x[6];
Des[PlaneLineStep*6]= (unsigned char)x[7];
Des[PlaneLineStep*7]= (unsigned char)x[8];
}
Src ++;
Des ++;
}
/* done with filtering the horizontal edge, now let's do the
vertical one */
/* skip the first one */
if(CurrentFrag==StartFrag)
CurrentFrag++;
else{
Des=DesPtr-8*PlaneLineStep+8*(CurrentFrag-StartFrag);
Src=Des-5;
Des-=4;
QStep = QuantScale[pbi->FragQIndex[CurrentFrag]];
FLimit = ( QStep * 3 ) >> 2;
for( j=0; j<8 ; j++){
x[0] = Src[0];
x[1] = Src[1];
x[2] = Src[2];
x[3] = Src[3];
x[4] = Src[4];
x[5] = Src[5];
x[6] = Src[6];
x[7] = Src[7];
x[8] = Src[8];
x[9] = Src[9];
Sum1=Sum2=0;
for(k=1;k<=4;k++){
Sum1 += abs(x[k]-x[k-1]);
Sum2 += abs(x[k+4]-x[k+5]);
}
pbi->FragmentVariances[CurrentFrag-1] += ((Sum1>255)?255:Sum1);
pbi->FragmentVariances[CurrentFrag] += ((Sum2>255)?255:Sum2);
if( Sum1 < FLimit &&
Sum2 < FLimit &&
(x[5] - x[4]) < QStep &&
(x[4] - x[5]) < QStep ){
/* low pass filtering (LPF7: 1 1 1 2 1 1 1) */
Des[0] = (x[0] + x[0] +x[0] + x[1] * 2 + x[2] + x[3] +x[4] + 4) >> 3;
Des[1] = (x[0] + x[0] +x[1] + x[2] * 2 + x[3] + x[4] +x[5] + 4) >> 3;
Des[2] = (x[0] + x[1] +x[2] + x[3] * 2 + x[4] + x[5] +x[6] + 4) >> 3;
Des[3] = (x[1] + x[2] +x[3] + x[4] * 2 + x[5] + x[6] +x[7] + 4) >> 3;
Des[4] = (x[2] + x[3] +x[4] + x[5] * 2 + x[6] + x[7] +x[8] + 4) >> 3;
Des[5] = (x[3] + x[4] +x[5] + x[6] * 2 + x[7] + x[8] +x[9] + 4) >> 3;
Des[6] = (x[4] + x[5] +x[6] + x[7] * 2 + x[8] + x[9] +x[9] + 4) >> 3;
Des[7] = (x[5] + x[6] +x[7] + x[8] * 2 + x[9] + x[9] +x[9] + 4) >> 3;
}
Src += PlaneLineStep;
Des += PlaneLineStep;
}
CurrentFrag ++;
}
}
}
static void DeblockVerticalEdgesInLoopFilteredBand(PB_INSTANCE *pbi,
unsigned char *SrcPtr,
unsigned char *DesPtr,
ogg_uint32_t PlaneLineStep,
ogg_uint32_t FragsAcross,
ogg_uint32_t StartFrag,
const ogg_uint32_t *QuantScale){
ogg_uint32_t j,k;
ogg_uint32_t CurrentFrag=StartFrag;
ogg_int32_t QStep;
ogg_int32_t FLimit;
unsigned char *Src, *Des;
ogg_int32_t x[10];
ogg_int32_t Sum1, Sum2;
while(CurrentFrag < StartFrag + FragsAcross-1) {
Src=SrcPtr+8*(CurrentFrag-StartFrag+1)-5;
Des=DesPtr+8*(CurrentFrag-StartFrag+1)-4;
QStep = QuantScale[pbi->FragQIndex[CurrentFrag+1]];
FLimit = ( QStep * 3)>>2 ;
for( j=0; j<8 ; j++){
x[0] = Src[0];
x[1] = Src[1];
x[2] = Src[2];
x[3] = Src[3];
x[4] = Src[4];
x[5] = Src[5];
x[6] = Src[6];
x[7] = Src[7];
x[8] = Src[8];
x[9] = Src[9];
Sum1=Sum2=0;
for(k=1;k<=4;k++){
Sum1 += abs(x[k]-x[k-1]);
Sum2 += abs(x[k+4]-x[k+5]);
}
pbi->FragmentVariances[CurrentFrag] += ((Sum1>255)?255:Sum1);
pbi->FragmentVariances[CurrentFrag+1] += ((Sum2>255)?255:Sum2);
if( Sum1 < FLimit &&
Sum2 < FLimit &&
(x[5] - x[4]) < QStep &&
(x[4] - x[5]) < QStep ){
/* low pass filtering (LPF7: 1 1 1 2 1 1 1) */
Des[0] = (x[0] + x[0] +x[0] + x[1] * 2 + x[2] + x[3] +x[4] + 4) >> 3;
Des[1] = (x[0] + x[0] +x[1] + x[2] * 2 + x[3] + x[4] +x[5] + 4) >> 3;
Des[2] = (x[0] + x[1] +x[2] + x[3] * 2 + x[4] + x[5] +x[6] + 4) >> 3;
Des[3] = (x[1] + x[2] +x[3] + x[4] * 2 + x[5] + x[6] +x[7] + 4) >> 3;
Des[4] = (x[2] + x[3] +x[4] + x[5] * 2 + x[6] + x[7] +x[8] + 4) >> 3;
Des[5] = (x[3] + x[4] +x[5] + x[6] * 2 + x[7] + x[8] +x[9] + 4) >> 3;
Des[6] = (x[4] + x[5] +x[6] + x[7] * 2 + x[8] + x[9] +x[9] + 4) >> 3;
Des[7] = (x[5] + x[6] +x[7] + x[8] * 2 + x[9] + x[9] +x[9] + 4) >> 3;
}
Src +=PlaneLineStep;
Des +=PlaneLineStep;
}
CurrentFrag ++;
}
}
static void DeblockPlane(PB_INSTANCE *pbi,
unsigned char *SourceBuffer,
unsigned char *DestinationBuffer,
ogg_uint32_t Channel ){
ogg_uint32_t i,k;
ogg_uint32_t PlaneLineStep=0;
ogg_uint32_t StartFrag =0;
ogg_uint32_t PixelIndex=0;
unsigned char * SrcPtr=0, * DesPtr=0;
ogg_uint32_t FragsAcross=0;
ogg_uint32_t FragsDown=0;
const ogg_uint32_t *QuantScale=0;
switch( Channel ){
case 0:
/* Get the parameters */
PlaneLineStep = pbi->YStride;
FragsAcross = pbi->HFragments;
FragsDown = pbi->VFragments;
StartFrag = 0;
PixelIndex = pbi->ReconYDataOffset;
SrcPtr = & SourceBuffer[PixelIndex];
DesPtr = & DestinationBuffer[PixelIndex];
break;
case 1:
/* Get the parameters */
PlaneLineStep = pbi->UVStride;
FragsAcross = pbi->HFragments / 2;
FragsDown = pbi->VFragments / 2;
StartFrag = pbi->YPlaneFragments;
PixelIndex = pbi->ReconUDataOffset;
SrcPtr = & SourceBuffer[PixelIndex];
DesPtr = & DestinationBuffer[PixelIndex];
break;
default:
/* Get the parameters */
PlaneLineStep = pbi->UVStride;
FragsAcross = pbi->HFragments / 2;
FragsDown = pbi->VFragments / 2;
StartFrag = pbi->YPlaneFragments + pbi->UVPlaneFragments;
PixelIndex = pbi->ReconVDataOffset;
SrcPtr = & SourceBuffer[PixelIndex];
DesPtr = & DestinationBuffer[PixelIndex];
break;
}
QuantScale = DcQuantScaleV1;
for(i=0;i<4;i++)
memcpy(DesPtr+i*PlaneLineStep, SrcPtr+i*PlaneLineStep, PlaneLineStep);
k = 1;
while( k < FragsDown ){
SrcPtr += 8*PlaneLineStep;
DesPtr += 8*PlaneLineStep;
/* Filter both the horizontal and vertical block edges inside the band */
DeblockLoopFilteredBand(pbi, SrcPtr, DesPtr, PlaneLineStep,
FragsAcross, StartFrag, QuantScale);
/* Move Pointers */
StartFrag += FragsAcross;
k ++;
}
/* The Last band */
for(i=0;i<4;i++)
memcpy(DesPtr+(i+4)*PlaneLineStep,
SrcPtr+(i+4)*PlaneLineStep,
PlaneLineStep);
DeblockVerticalEdgesInLoopFilteredBand(pbi,SrcPtr,DesPtr,PlaneLineStep,
FragsAcross,StartFrag,QuantScale);
}
static void DeblockFrame(PB_INSTANCE *pbi, unsigned char *SourceBuffer,
unsigned char *DestinationBuffer){
memset(pbi->FragmentVariances, 0 , sizeof(ogg_int32_t) * pbi->UnitFragments);
UpdateFragQIndex(pbi);
/* Y */
DeblockPlane( pbi, SourceBuffer, DestinationBuffer, 0);
/* U */
DeblockPlane( pbi, SourceBuffer, DestinationBuffer, 1);
/* V */
DeblockPlane( pbi, SourceBuffer, DestinationBuffer, 2);
}
void PostProcess(PB_INSTANCE *pbi){
switch (pbi->PostProcessingLevel){
case 8:
/* on a slow machine, use a simpler and faster deblocking filter */
DeblockFrame(pbi, pbi->LastFrameRecon,pbi->PostProcessBuffer);
break;
case 6:
DeblockFrame(pbi, pbi->LastFrameRecon,pbi->PostProcessBuffer);
UpdateUMVBorder(pbi, pbi->PostProcessBuffer );
DeringFrame(pbi, pbi->PostProcessBuffer, pbi->PostProcessBuffer);
break;
case 5:
DeblockFrame(pbi, pbi->LastFrameRecon,pbi->PostProcessBuffer);
UpdateUMVBorder(pbi, pbi->PostProcessBuffer );
DeringFrame(pbi, pbi->PostProcessBuffer, pbi->PostProcessBuffer);
break;
case 4:
DeblockFrame(pbi, pbi->LastFrameRecon, pbi->PostProcessBuffer);
break;
case 1:
UpdateFragQIndex(pbi);
break;
case 0:
break;
default:
DeblockFrame(pbi, pbi->LastFrameRecon, pbi->PostProcessBuffer);
UpdateUMVBorder(pbi, pbi->PostProcessBuffer );
DeringFrame(pbi, pbi->PostProcessBuffer, pbi->PostProcessBuffer);
break;
}
}

View file

@ -1,48 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: pp.h 13884 2007-09-22 08:38:10Z giles $
********************************************************************/
/* Constants. */
#define INTERNAL_BLOCK_HEIGHT 8
#define INTERNAL_BLOCK_WIDTH 8
/* NEW Line search values. */
#define UP 0
#define DOWN 1
#define LEFT 2
#define RIGHT 3
#define FIRST_ROW 0
#define NOT_EDGE_ROW 1
#define LAST_ROW 2
#define YDIFF_CB_ROWS (INTERNAL_BLOCK_HEIGHT * 3)
#define CHLOCALS_CB_ROWS (INTERNAL_BLOCK_HEIGHT * 3)
#define PMAP_CB_ROWS (INTERNAL_BLOCK_HEIGHT * 3)
#define PSCORE_CB_ROWS (INTERNAL_BLOCK_HEIGHT * 4)
/* Status values in block coding map */
#define CANDIDATE_BLOCK_LOW -2
#define CANDIDATE_BLOCK -1
#define BLOCK_NOT_CODED 0
#define BLOCK_CODED_BAR 3
#define BLOCK_CODED_SGC 4
#define BLOCK_CODED_LOW 4
#define BLOCK_CODED 5
#define MAX_PREV_FRAMES 16
#define MAX_SEARCH_LINE_LEN 7

View file

@ -1,43 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: quant_lookup.h 13884 2007-09-22 08:38:10Z giles $
********************************************************************/
#include "codec_internal.h"
#define MIN16 ((1<<16)-1)
#define SHIFT16 (1<<16)
#define MIN_LEGAL_QUANT_ENTRY 8
#define MIN_DEQUANT_VAL 2
#define IDCT_SCALE_FACTOR 2 /* Shift left bits to improve IDCT precision */
#define OLD_SCHEME 1
/******************************
* lookup table for DCT coefficient zig-zag ordering
* ****************************/
static const ogg_uint32_t dezigzag_index[64] = {
0, 1, 8, 16, 9, 2, 3, 10,
17, 24, 32, 25, 18, 11, 4, 5,
12, 19, 26, 33, 40, 48, 41, 34,
27, 20, 13, 6, 7, 14, 21, 28,
35, 42, 49, 56, 57, 50, 43, 36,
29, 22, 15, 23, 30, 37, 44, 51,
58, 59, 52, 45, 38, 31, 39, 46,
53, 60, 61, 54, 47, 55, 62, 63
};

View file

@ -1,110 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: reconstruct.c 13884 2007-09-22 08:38:10Z giles $
********************************************************************/
#include "codec_internal.h"
static void copy8x8__c (unsigned char *src,
unsigned char *dest,
unsigned int stride)
{
int j;
for ( j = 0; j < 8; j++ ){
((ogg_uint32_t*)dest)[0] = ((ogg_uint32_t*)src)[0];
((ogg_uint32_t*)dest)[1] = ((ogg_uint32_t*)src)[1];
src+=stride;
dest+=stride;
}
}
static void recon_intra8x8__c (unsigned char *ReconPtr, ogg_int16_t *ChangePtr,
ogg_uint32_t LineStep)
{
ogg_uint32_t i;
for (i = 8; i; i--){
/* Convert the data back to 8 bit unsigned */
/* Saturate the output to unsigend 8 bit values */
ReconPtr[0] = clamp255( ChangePtr[0] + 128 );
ReconPtr[1] = clamp255( ChangePtr[1] + 128 );
ReconPtr[2] = clamp255( ChangePtr[2] + 128 );
ReconPtr[3] = clamp255( ChangePtr[3] + 128 );
ReconPtr[4] = clamp255( ChangePtr[4] + 128 );
ReconPtr[5] = clamp255( ChangePtr[5] + 128 );
ReconPtr[6] = clamp255( ChangePtr[6] + 128 );
ReconPtr[7] = clamp255( ChangePtr[7] + 128 );
ReconPtr += LineStep;
ChangePtr += 8;
}
}
static void recon_inter8x8__c (unsigned char *ReconPtr, unsigned char *RefPtr,
ogg_int16_t *ChangePtr, ogg_uint32_t LineStep)
{
ogg_uint32_t i;
for (i = 8; i; i--){
ReconPtr[0] = clamp255(RefPtr[0] + ChangePtr[0]);
ReconPtr[1] = clamp255(RefPtr[1] + ChangePtr[1]);
ReconPtr[2] = clamp255(RefPtr[2] + ChangePtr[2]);
ReconPtr[3] = clamp255(RefPtr[3] + ChangePtr[3]);
ReconPtr[4] = clamp255(RefPtr[4] + ChangePtr[4]);
ReconPtr[5] = clamp255(RefPtr[5] + ChangePtr[5]);
ReconPtr[6] = clamp255(RefPtr[6] + ChangePtr[6]);
ReconPtr[7] = clamp255(RefPtr[7] + ChangePtr[7]);
ChangePtr += 8;
ReconPtr += LineStep;
RefPtr += LineStep;
}
}
static void recon_inter8x8_half__c (unsigned char *ReconPtr, unsigned char *RefPtr1,
unsigned char *RefPtr2, ogg_int16_t *ChangePtr,
ogg_uint32_t LineStep)
{
ogg_uint32_t i;
for (i = 8; i; i--){
ReconPtr[0] = clamp255((((int)RefPtr1[0] + (int)RefPtr2[0]) >> 1) + ChangePtr[0] );
ReconPtr[1] = clamp255((((int)RefPtr1[1] + (int)RefPtr2[1]) >> 1) + ChangePtr[1] );
ReconPtr[2] = clamp255((((int)RefPtr1[2] + (int)RefPtr2[2]) >> 1) + ChangePtr[2] );
ReconPtr[3] = clamp255((((int)RefPtr1[3] + (int)RefPtr2[3]) >> 1) + ChangePtr[3] );
ReconPtr[4] = clamp255((((int)RefPtr1[4] + (int)RefPtr2[4]) >> 1) + ChangePtr[4] );
ReconPtr[5] = clamp255((((int)RefPtr1[5] + (int)RefPtr2[5]) >> 1) + ChangePtr[5] );
ReconPtr[6] = clamp255((((int)RefPtr1[6] + (int)RefPtr2[6]) >> 1) + ChangePtr[6] );
ReconPtr[7] = clamp255((((int)RefPtr1[7] + (int)RefPtr2[7]) >> 1) + ChangePtr[7] );
ChangePtr += 8;
ReconPtr += LineStep;
RefPtr1 += LineStep;
RefPtr2 += LineStep;
}
}
void dsp_recon_init (DspFunctions *funcs, ogg_uint32_t cpu_flags)
{
funcs->copy8x8 = copy8x8__c;
funcs->recon_intra8x8 = recon_intra8x8__c;
funcs->recon_inter8x8 = recon_inter8x8__c;
funcs->recon_inter8x8_half = recon_inter8x8_half__c;
#if defined(USE_ASM)
if (cpu_flags & OC_CPU_X86_MMX) {
dsp_mmx_recon_init(funcs);
}
#endif
}

File diff suppressed because it is too large Load diff

View file

@ -1,40 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: toplevel_lookup.h 13884 2007-09-22 08:38:10Z giles $
********************************************************************/
#include "codec_internal.h"
const ogg_uint32_t PriorKeyFrameWeight[KEY_FRAME_CONTEXT] = { 1,2,3,4,5 };
/* Data structures controlling addition of residue blocks */
const ogg_uint32_t ResidueErrorThresh[Q_TABLE_SIZE] = {
750, 700, 650, 600, 590, 580, 570, 560,
550, 540, 530, 520, 510, 500, 490, 480,
470, 460, 450, 440, 430, 420, 410, 400,
390, 380, 370, 360, 350, 340, 330, 320,
310, 300, 290, 280, 270, 260, 250, 245,
240, 235, 230, 225, 220, 215, 210, 205,
200, 195, 190, 185, 180, 175, 170, 165,
160, 155, 150, 145, 140, 135, 130, 130 };
const ogg_uint32_t ResidueBlockFactor[Q_TABLE_SIZE] = {
3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3,
2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2 };

View file

@ -1,409 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2008 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dct_decode_mmx.c 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
#include <stdlib.h>
#include "../codec_internal.h"
#if defined(USE_ASM)
static const __attribute__((aligned(8),used)) ogg_int64_t OC_V3=
0x0003000300030003LL;
static const __attribute__((aligned(8),used)) ogg_int64_t OC_V4=
0x0004000400040004LL;
static void loop_filter_v(unsigned char *_pix,int _ystride,
const ogg_int16_t *_ll){
long esi;
_pix-=_ystride*2;
__asm__ __volatile__(
/*mm0=0*/
"pxor %%mm0,%%mm0\n\t"
/*esi=_ystride*3*/
"lea (%[ystride],%[ystride],2),%[s]\n\t"
/*mm7=_pix[0...8]*/
"movq (%[pix]),%%mm7\n\t"
/*mm4=_pix[0...8+_ystride*3]*/
"movq (%[pix],%[s]),%%mm4\n\t"
/*mm6=_pix[0...8]*/
"movq %%mm7,%%mm6\n\t"
/*Expand unsigned _pix[0...3] to 16 bits.*/
"punpcklbw %%mm0,%%mm6\n\t"
"movq %%mm4,%%mm5\n\t"
/*Expand unsigned _pix[4...8] to 16 bits.*/
"punpckhbw %%mm0,%%mm7\n\t"
/*Expand other arrays too.*/
"punpcklbw %%mm0,%%mm4\n\t"
"punpckhbw %%mm0,%%mm5\n\t"
/*mm7:mm6=_p[0...8]-_p[0...8+_ystride*3]:*/
"psubw %%mm4,%%mm6\n\t"
"psubw %%mm5,%%mm7\n\t"
/*mm5=mm4=_pix[0...8+_ystride]*/
"movq (%[pix],%[ystride]),%%mm4\n\t"
/*mm1=mm3=mm2=_pix[0..8]+_ystride*2]*/
"movq (%[pix],%[ystride],2),%%mm2\n\t"
"movq %%mm4,%%mm5\n\t"
"movq %%mm2,%%mm3\n\t"
"movq %%mm2,%%mm1\n\t"
/*Expand these arrays.*/
"punpckhbw %%mm0,%%mm5\n\t"
"punpcklbw %%mm0,%%mm4\n\t"
"punpckhbw %%mm0,%%mm3\n\t"
"punpcklbw %%mm0,%%mm2\n\t"
/*Preload...*/
"movq %[OC_V3],%%mm0\n\t"
/*mm3:mm2=_pix[0...8+_ystride*2]-_pix[0...8+_ystride]*/
"psubw %%mm5,%%mm3\n\t"
"psubw %%mm4,%%mm2\n\t"
/*Scale by 3.*/
"pmullw %%mm0,%%mm3\n\t"
"pmullw %%mm0,%%mm2\n\t"
/*Preload...*/
"movq %[OC_V4],%%mm0\n\t"
/*f=mm3:mm2==_pix[0...8]-_pix[0...8+_ystride*3]+
3*(_pix[0...8+_ystride*2]-_pix[0...8+_ystride])*/
"paddw %%mm7,%%mm3\n\t"
"paddw %%mm6,%%mm2\n\t"
/*Add 4.*/
"paddw %%mm0,%%mm3\n\t"
"paddw %%mm0,%%mm2\n\t"
/*"Divide" by 8.*/
"psraw $3,%%mm3\n\t"
"psraw $3,%%mm2\n\t"
/*Now compute lflim of mm3:mm2 cf. Section 7.10 of the sepc.*/
/*Free up mm5.*/
"packuswb %%mm5,%%mm4\n\t"
/*mm0=L L L L*/
"movq (%[ll]),%%mm0\n\t"
/*if(R_i<-2L||R_i>2L)R_i=0:*/
"movq %%mm2,%%mm5\n\t"
"pxor %%mm6,%%mm6\n\t"
"movq %%mm0,%%mm7\n\t"
"psubw %%mm0,%%mm6\n\t"
"psllw $1,%%mm7\n\t"
"psllw $1,%%mm6\n\t"
/*mm2==R_3 R_2 R_1 R_0*/
/*mm5==R_3 R_2 R_1 R_0*/
/*mm6==-2L -2L -2L -2L*/
/*mm7==2L 2L 2L 2L*/
"pcmpgtw %%mm2,%%mm7\n\t"
"pcmpgtw %%mm6,%%mm5\n\t"
"pand %%mm7,%%mm2\n\t"
"movq %%mm0,%%mm7\n\t"
"pand %%mm5,%%mm2\n\t"
"psllw $1,%%mm7\n\t"
"movq %%mm3,%%mm5\n\t"
/*mm3==R_7 R_6 R_5 R_4*/
/*mm5==R_7 R_6 R_5 R_4*/
/*mm6==-2L -2L -2L -2L*/
/*mm7==2L 2L 2L 2L*/
"pcmpgtw %%mm3,%%mm7\n\t"
"pcmpgtw %%mm6,%%mm5\n\t"
"pand %%mm7,%%mm3\n\t"
"movq %%mm0,%%mm7\n\t"
"pand %%mm5,%%mm3\n\t"
/*if(R_i<-L)R_i'=R_i+2L;
if(R_i>L)R_i'=R_i-2L;
if(R_i<-L||R_i>L)R_i=-R_i':*/
"psraw $1,%%mm6\n\t"
"movq %%mm2,%%mm5\n\t"
"psllw $1,%%mm7\n\t"
/*mm2==R_3 R_2 R_1 R_0*/
/*mm5==R_3 R_2 R_1 R_0*/
/*mm6==-L -L -L -L*/
/*mm0==L L L L*/
/*mm5=R_i>L?FF:00*/
"pcmpgtw %%mm0,%%mm5\n\t"
/*mm6=-L>R_i?FF:00*/
"pcmpgtw %%mm2,%%mm6\n\t"
/*mm7=R_i>L?2L:0*/
"pand %%mm5,%%mm7\n\t"
/*mm2=R_i>L?R_i-2L:R_i*/
"psubw %%mm7,%%mm2\n\t"
"movq %%mm0,%%mm7\n\t"
/*mm5=-L>R_i||R_i>L*/
"por %%mm6,%%mm5\n\t"
"psllw $1,%%mm7\n\t"
/*mm7=-L>R_i?2L:0*/
"pand %%mm6,%%mm7\n\t"
"pxor %%mm6,%%mm6\n\t"
/*mm2=-L>R_i?R_i+2L:R_i*/
"paddw %%mm7,%%mm2\n\t"
"psubw %%mm0,%%mm6\n\t"
/*mm5=-L>R_i||R_i>L?-R_i':0*/
"pand %%mm2,%%mm5\n\t"
"movq %%mm0,%%mm7\n\t"
/*mm2=-L>R_i||R_i>L?0:R_i*/
"psubw %%mm5,%%mm2\n\t"
"psllw $1,%%mm7\n\t"
/*mm2=-L>R_i||R_i>L?-R_i':R_i*/
"psubw %%mm5,%%mm2\n\t"
"movq %%mm3,%%mm5\n\t"
/*mm3==R_7 R_6 R_5 R_4*/
/*mm5==R_7 R_6 R_5 R_4*/
/*mm6==-L -L -L -L*/
/*mm0==L L L L*/
/*mm6=-L>R_i?FF:00*/
"pcmpgtw %%mm3,%%mm6\n\t"
/*mm5=R_i>L?FF:00*/
"pcmpgtw %%mm0,%%mm5\n\t"
/*mm7=R_i>L?2L:0*/
"pand %%mm5,%%mm7\n\t"
/*mm2=R_i>L?R_i-2L:R_i*/
"psubw %%mm7,%%mm3\n\t"
"psllw $1,%%mm0\n\t"
/*mm5=-L>R_i||R_i>L*/
"por %%mm6,%%mm5\n\t"
/*mm0=-L>R_i?2L:0*/
"pand %%mm6,%%mm0\n\t"
/*mm3=-L>R_i?R_i+2L:R_i*/
"paddw %%mm0,%%mm3\n\t"
/*mm5=-L>R_i||R_i>L?-R_i':0*/
"pand %%mm3,%%mm5\n\t"
/*mm2=-L>R_i||R_i>L?0:R_i*/
"psubw %%mm5,%%mm3\n\t"
/*mm2=-L>R_i||R_i>L?-R_i':R_i*/
"psubw %%mm5,%%mm3\n\t"
/*Unfortunately, there's no unsigned byte+signed byte with unsigned
saturation op code, so we have to promote things back 16 bits.*/
"pxor %%mm0,%%mm0\n\t"
"movq %%mm4,%%mm5\n\t"
"punpcklbw %%mm0,%%mm4\n\t"
"punpckhbw %%mm0,%%mm5\n\t"
"movq %%mm1,%%mm6\n\t"
"punpcklbw %%mm0,%%mm1\n\t"
"punpckhbw %%mm0,%%mm6\n\t"
/*_pix[0...8+_ystride]+=R_i*/
"paddw %%mm2,%%mm4\n\t"
"paddw %%mm3,%%mm5\n\t"
/*_pix[0...8+_ystride*2]-=R_i*/
"psubw %%mm2,%%mm1\n\t"
"psubw %%mm3,%%mm6\n\t"
"packuswb %%mm5,%%mm4\n\t"
"packuswb %%mm6,%%mm1\n\t"
/*Write it back out.*/
"movq %%mm4,(%[pix],%[ystride])\n\t"
"movq %%mm1,(%[pix],%[ystride],2)\n\t"
:[s]"=&S"(esi)
:[pix]"r"(_pix),[ystride]"r"((long)_ystride),[ll]"r"(_ll),
[OC_V3]"m"(OC_V3),[OC_V4]"m"(OC_V4)
:"memory"
);
}
/*This code implements the bulk of loop_filter_h().
Data are striped p0 p1 p2 p3 ... p0 p1 p2 p3 ..., so in order to load all
four p0's to one register we must transpose the values in four mmx regs.
When half is done we repeat this for the rest.*/
static void loop_filter_h4(unsigned char *_pix,long _ystride,
const ogg_int16_t *_ll){
long esi;
long edi;
__asm__ __volatile__(
/*x x x x 3 2 1 0*/
"movd (%[pix]),%%mm0\n\t"
/*esi=_ystride*3*/
"lea (%[ystride],%[ystride],2),%[s]\n\t"
/*x x x x 7 6 5 4*/
"movd (%[pix],%[ystride]),%%mm1\n\t"
/*x x x x B A 9 8*/
"movd (%[pix],%[ystride],2),%%mm2\n\t"
/*x x x x F E D C*/
"movd (%[pix],%[s]),%%mm3\n\t"
/*mm0=7 3 6 2 5 1 4 0*/
"punpcklbw %%mm1,%%mm0\n\t"
/*mm2=F B E A D 9 C 8*/
"punpcklbw %%mm3,%%mm2\n\t"
/*mm1=7 3 6 2 5 1 4 0*/
"movq %%mm0,%%mm1\n\t"
/*mm0=F B 7 3 E A 6 2*/
"punpckhwd %%mm2,%%mm0\n\t"
/*mm1=D 9 5 1 C 8 4 0*/
"punpcklwd %%mm2,%%mm1\n\t"
"pxor %%mm7,%%mm7\n\t"
/*mm5=D 9 5 1 C 8 4 0*/
"movq %%mm1,%%mm5\n\t"
/*mm1=x C x 8 x 4 x 0==pix[0]*/
"punpcklbw %%mm7,%%mm1\n\t"
/*mm5=x D x 9 x 5 x 1==pix[1]*/
"punpckhbw %%mm7,%%mm5\n\t"
/*mm3=F B 7 3 E A 6 2*/
"movq %%mm0,%%mm3\n\t"
/*mm0=x E x A x 6 x 2==pix[2]*/
"punpcklbw %%mm7,%%mm0\n\t"
/*mm3=x F x B x 7 x 3==pix[3]*/
"punpckhbw %%mm7,%%mm3\n\t"
/*mm1=mm1-mm3==pix[0]-pix[3]*/
"psubw %%mm3,%%mm1\n\t"
/*Save a copy of pix[2] for later.*/
"movq %%mm0,%%mm4\n\t"
/*mm0=mm0-mm5==pix[2]-pix[1]*/
"psubw %%mm5,%%mm0\n\t"
/*Scale by 3.*/
"pmullw %[OC_V3],%%mm0\n\t"
/*f=mm1==_pix[0]-_pix[3]+ 3*(_pix[2]-_pix[1])*/
"paddw %%mm1,%%mm0\n\t"
/*Add 4.*/
"paddw %[OC_V4],%%mm0\n\t"
/*"Divide" by 8, producing the residuals R_i.*/
"psraw $3,%%mm0\n\t"
/*Now compute lflim of mm0 cf. Section 7.10 of the sepc.*/
/*mm6=L L L L*/
"movq (%[ll]),%%mm6\n\t"
/*if(R_i<-2L||R_i>2L)R_i=0:*/
"movq %%mm0,%%mm1\n\t"
"pxor %%mm2,%%mm2\n\t"
"movq %%mm6,%%mm3\n\t"
"psubw %%mm6,%%mm2\n\t"
"psllw $1,%%mm3\n\t"
"psllw $1,%%mm2\n\t"
/*mm0==R_3 R_2 R_1 R_0*/
/*mm1==R_3 R_2 R_1 R_0*/
/*mm2==-2L -2L -2L -2L*/
/*mm3==2L 2L 2L 2L*/
"pcmpgtw %%mm0,%%mm3\n\t"
"pcmpgtw %%mm2,%%mm1\n\t"
"pand %%mm3,%%mm0\n\t"
"pand %%mm1,%%mm0\n\t"
/*if(R_i<-L)R_i'=R_i+2L;
if(R_i>L)R_i'=R_i-2L;
if(R_i<-L||R_i>L)R_i=-R_i':*/
"psraw $1,%%mm2\n\t"
"movq %%mm0,%%mm1\n\t"
"movq %%mm6,%%mm3\n\t"
/*mm0==R_3 R_2 R_1 R_0*/
/*mm1==R_3 R_2 R_1 R_0*/
/*mm2==-L -L -L -L*/
/*mm6==L L L L*/
/*mm2=-L>R_i?FF:00*/
"pcmpgtw %%mm0,%%mm2\n\t"
/*mm1=R_i>L?FF:00*/
"pcmpgtw %%mm6,%%mm1\n\t"
/*mm3=2L 2L 2L 2L*/
"psllw $1,%%mm3\n\t"
/*mm6=2L 2L 2L 2L*/
"psllw $1,%%mm6\n\t"
/*mm3=R_i>L?2L:0*/
"pand %%mm1,%%mm3\n\t"
/*mm6=-L>R_i?2L:0*/
"pand %%mm2,%%mm6\n\t"
/*mm0=R_i>L?R_i-2L:R_i*/
"psubw %%mm3,%%mm0\n\t"
/*mm1=-L>R_i||R_i>L*/
"por %%mm2,%%mm1\n\t"
/*mm0=-L>R_i?R_i+2L:R_i*/
"paddw %%mm6,%%mm0\n\t"
/*mm1=-L>R_i||R_i>L?R_i':0*/
"pand %%mm0,%%mm1\n\t"
/*mm0=-L>R_i||R_i>L?0:R_i*/
"psubw %%mm1,%%mm0\n\t"
/*mm0=-L>R_i||R_i>L?-R_i':R_i*/
"psubw %%mm1,%%mm0\n\t"
/*_pix[1]+=R_i;*/
"paddw %%mm0,%%mm5\n\t"
/*_pix[2]-=R_i;*/
"psubw %%mm0,%%mm4\n\t"
/*mm5=x x x x D 9 5 1*/
"packuswb %%mm7,%%mm5\n\t"
/*mm4=x x x x E A 6 2*/
"packuswb %%mm7,%%mm4\n\t"
/*mm5=E D A 9 6 5 2 1*/
"punpcklbw %%mm4,%%mm5\n\t"
/*edi=6 5 2 1*/
"movd %%mm5,%%edi\n\t"
"movw %%di,1(%[pix])\n\t"
/*Why is there such a big stall here?*/
"psrlq $32,%%mm5\n\t"
"shrl $16,%%edi\n\t"
"movw %%di,1(%[pix],%[ystride])\n\t"
/*edi=E D A 9*/
"movd %%mm5,%%edi\n\t"
"movw %%di,1(%[pix],%[ystride],2)\n\t"
"shrl $16,%%edi\n\t"
"movw %%di,1(%[pix],%[s])\n\t"
:[s]"=&S"(esi),[d]"=&D"(edi),
[pix]"+r"(_pix),[ystride]"+r"(_ystride),[ll]"+r"(_ll)
:[OC_V3]"m"(OC_V3),[OC_V4]"m"(OC_V4)
:"memory"
);
}
static void loop_filter_h(unsigned char *_pix,int _ystride,
const ogg_int16_t *_ll){
_pix-=2;
loop_filter_h4(_pix,_ystride,_ll);
loop_filter_h4(_pix+(_ystride<<2),_ystride,_ll);
}
static void loop_filter_mmx(PB_INSTANCE *pbi, int FLimit){
int j;
ogg_int16_t __attribute__((aligned(8))) ll[4];
unsigned char *cp = pbi->display_fragments;
ogg_uint32_t *bp = pbi->recon_pixel_index_table;
if ( FLimit == 0 ) return;
ll[0]=ll[1]=ll[2]=ll[3]=FLimit;
for ( j = 0; j < 3 ; j++){
ogg_uint32_t *bp_begin = bp;
ogg_uint32_t *bp_end;
int stride;
int h;
switch(j) {
case 0: /* y */
bp_end = bp + pbi->YPlaneFragments;
h = pbi->HFragments;
stride = pbi->YStride;
break;
default: /* u,v, 4:20 specific */
bp_end = bp + pbi->UVPlaneFragments;
h = pbi->HFragments >> 1;
stride = pbi->UVStride;
break;
}
while(bp<bp_end){
ogg_uint32_t *bp_left = bp;
ogg_uint32_t *bp_right = bp + h;
while(bp<bp_right){
if(cp[0]){
if(bp>bp_left)
loop_filter_h(&pbi->LastFrameRecon[bp[0]],stride,ll);
if(bp_left>bp_begin)
loop_filter_v(&pbi->LastFrameRecon[bp[0]],stride,ll);
if(bp+1<bp_right && !cp[1])
loop_filter_h(&pbi->LastFrameRecon[bp[0]]+8,stride,ll);
if(bp+h<bp_end && !cp[h])
loop_filter_v(&pbi->LastFrameRecon[bp[h]],stride,ll);
}
bp++;
cp++;
}
}
}
__asm__ __volatile__("emms\n\t");
}
/* install our implementation in the function table */
void dsp_mmx_dct_decode_init(DspFunctions *funcs)
{
funcs->LoopFilter = loop_filter_mmx;
}
#endif /* USE_ASM */

View file

@ -1,666 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dsp_mmx.c 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
#include <stdlib.h>
#include "../codec_internal.h"
#include "../dsp.h"
#if defined(USE_ASM)
static const __attribute__ ((aligned(8),used)) ogg_int64_t V128 = 0x0080008000800080LL;
#define DSP_OP_AVG(a,b) ((((int)(a)) + ((int)(b)))/2)
#define DSP_OP_DIFF(a,b) (((int)(a)) - ((int)(b)))
#define DSP_OP_ABS_DIFF(a,b) abs((((int)(a)) - ((int)(b))))
#define SUB_LOOP \
" movq (%0), %%mm0 \n\t" /* mm0 = FiltPtr */ \
" movq (%1), %%mm1 \n\t" /* mm1 = ReconPtr */ \
" movq %%mm0, %%mm2 \n\t" /* dup to prepare for up conversion */\
" movq %%mm1, %%mm3 \n\t" /* dup to prepare for up conversion */\
/* convert from UINT8 to INT16 */ \
" punpcklbw %%mm7, %%mm0 \n\t" /* mm0 = INT16(FiltPtr) */ \
" punpcklbw %%mm7, %%mm1 \n\t" /* mm1 = INT16(ReconPtr) */ \
" punpckhbw %%mm7, %%mm2 \n\t" /* mm2 = INT16(FiltPtr) */ \
" punpckhbw %%mm7, %%mm3 \n\t" /* mm3 = INT16(ReconPtr) */ \
/* start calculation */ \
" psubw %%mm1, %%mm0 \n\t" /* mm0 = FiltPtr - ReconPtr */ \
" psubw %%mm3, %%mm2 \n\t" /* mm2 = FiltPtr - ReconPtr */ \
" movq %%mm0, (%2) \n\t" /* write answer out */ \
" movq %%mm2, 8(%2) \n\t" /* write answer out */ \
/* Increment pointers */ \
" add $16, %2 \n\t" \
" add %3, %0 \n\t" \
" add %4, %1 \n\t"
static void sub8x8__mmx (unsigned char *FiltPtr, unsigned char *ReconPtr,
ogg_int16_t *DctInputPtr, ogg_uint32_t PixelsPerLine,
ogg_uint32_t ReconPixelsPerLine)
{
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm7, %%mm7 \n\t"
SUB_LOOP
SUB_LOOP
SUB_LOOP
SUB_LOOP
SUB_LOOP
SUB_LOOP
SUB_LOOP
SUB_LOOP
: "+r" (FiltPtr),
"+r" (ReconPtr),
"+r" (DctInputPtr)
: "m" (PixelsPerLine),
"m" (ReconPixelsPerLine)
: "memory"
);
}
#define SUB_128_LOOP \
" movq (%0), %%mm0 \n\t" /* mm0 = FiltPtr */ \
" movq %%mm0, %%mm2 \n\t" /* dup to prepare for up conversion */\
/* convert from UINT8 to INT16 */ \
" punpcklbw %%mm7, %%mm0 \n\t" /* mm0 = INT16(FiltPtr) */ \
" punpckhbw %%mm7, %%mm2 \n\t" /* mm2 = INT16(FiltPtr) */ \
/* start calculation */ \
" psubw %%mm1, %%mm0 \n\t" /* mm0 = FiltPtr - 128 */ \
" psubw %%mm1, %%mm2 \n\t" /* mm2 = FiltPtr - 128 */ \
" movq %%mm0, (%1) \n\t" /* write answer out */ \
" movq %%mm2, 8(%1) \n\t" /* write answer out */ \
/* Increment pointers */ \
" add $16, %1 \n\t" \
" add %2, %0 \n\t"
static void sub8x8_128__mmx (unsigned char *FiltPtr, ogg_int16_t *DctInputPtr,
ogg_uint32_t PixelsPerLine)
{
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm7, %%mm7 \n\t"
" movq %[V128], %%mm1 \n\t"
SUB_128_LOOP
SUB_128_LOOP
SUB_128_LOOP
SUB_128_LOOP
SUB_128_LOOP
SUB_128_LOOP
SUB_128_LOOP
SUB_128_LOOP
: "+r" (FiltPtr),
"+r" (DctInputPtr)
: "m" (PixelsPerLine),
[V128] "m" (V128)
: "memory"
);
}
#define SUB_AVG2_LOOP \
" movq (%0), %%mm0 \n\t" /* mm0 = FiltPtr */ \
" movq (%1), %%mm1 \n\t" /* mm1 = ReconPtr1 */ \
" movq (%2), %%mm4 \n\t" /* mm1 = ReconPtr2 */ \
" movq %%mm0, %%mm2 \n\t" /* dup to prepare for up conversion */\
" movq %%mm1, %%mm3 \n\t" /* dup to prepare for up conversion */\
" movq %%mm4, %%mm5 \n\t" /* dup to prepare for up conversion */\
/* convert from UINT8 to INT16 */ \
" punpcklbw %%mm7, %%mm0 \n\t" /* mm0 = INT16(FiltPtr) */ \
" punpcklbw %%mm7, %%mm1 \n\t" /* mm1 = INT16(ReconPtr1) */ \
" punpcklbw %%mm7, %%mm4 \n\t" /* mm1 = INT16(ReconPtr2) */ \
" punpckhbw %%mm7, %%mm2 \n\t" /* mm2 = INT16(FiltPtr) */ \
" punpckhbw %%mm7, %%mm3 \n\t" /* mm3 = INT16(ReconPtr1) */ \
" punpckhbw %%mm7, %%mm5 \n\t" /* mm3 = INT16(ReconPtr2) */ \
/* average ReconPtr1 and ReconPtr2 */ \
" paddw %%mm4, %%mm1 \n\t" /* mm1 = ReconPtr1 + ReconPtr2 */ \
" paddw %%mm5, %%mm3 \n\t" /* mm3 = ReconPtr1 + ReconPtr2 */ \
" psrlw $1, %%mm1 \n\t" /* mm1 = (ReconPtr1 + ReconPtr2) / 2 */ \
" psrlw $1, %%mm3 \n\t" /* mm3 = (ReconPtr1 + ReconPtr2) / 2 */ \
" psubw %%mm1, %%mm0 \n\t" /* mm0 = FiltPtr - ((ReconPtr1 + ReconPtr2) / 2) */ \
" psubw %%mm3, %%mm2 \n\t" /* mm2 = FiltPtr - ((ReconPtr1 + ReconPtr2) / 2) */ \
" movq %%mm0, (%3) \n\t" /* write answer out */ \
" movq %%mm2, 8(%3) \n\t" /* write answer out */ \
/* Increment pointers */ \
" add $16, %3 \n\t" \
" add %4, %0 \n\t" \
" add %5, %1 \n\t" \
" add %5, %2 \n\t"
static void sub8x8avg2__mmx (unsigned char *FiltPtr, unsigned char *ReconPtr1,
unsigned char *ReconPtr2, ogg_int16_t *DctInputPtr,
ogg_uint32_t PixelsPerLine,
ogg_uint32_t ReconPixelsPerLine)
{
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm7, %%mm7 \n\t"
SUB_AVG2_LOOP
SUB_AVG2_LOOP
SUB_AVG2_LOOP
SUB_AVG2_LOOP
SUB_AVG2_LOOP
SUB_AVG2_LOOP
SUB_AVG2_LOOP
SUB_AVG2_LOOP
: "+r" (FiltPtr),
"+r" (ReconPtr1),
"+r" (ReconPtr2),
"+r" (DctInputPtr)
: "m" (PixelsPerLine),
"m" (ReconPixelsPerLine)
: "memory"
);
}
static ogg_uint32_t row_sad8__mmx (unsigned char *Src1, unsigned char *Src2)
{
ogg_uint32_t MaxSad;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm6, %%mm6 \n\t" /* zero out mm6 for unpack */
" pxor %%mm7, %%mm7 \n\t" /* zero out mm7 for unpack */
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t"
" movq %%mm0, %%mm2 \n\t"
" psubusb %%mm1, %%mm0 \n\t" /* A - B */
" psubusb %%mm2, %%mm1 \n\t" /* B - A */
" por %%mm1, %%mm0 \n\t" /* and or gives abs difference */
" movq %%mm0, %%mm1 \n\t"
" punpcklbw %%mm6, %%mm0 \n\t" /* ; unpack low four bytes to higher precision */
" punpckhbw %%mm7, %%mm1 \n\t" /* ; unpack high four bytes to higher precision */
" movq %%mm0, %%mm2 \n\t"
" movq %%mm1, %%mm3 \n\t"
" psrlq $32, %%mm2 \n\t" /* fold and add */
" psrlq $32, %%mm3 \n\t"
" paddw %%mm2, %%mm0 \n\t"
" paddw %%mm3, %%mm1 \n\t"
" movq %%mm0, %%mm2 \n\t"
" movq %%mm1, %%mm3 \n\t"
" psrlq $16, %%mm2 \n\t"
" psrlq $16, %%mm3 \n\t"
" paddw %%mm2, %%mm0 \n\t"
" paddw %%mm3, %%mm1 \n\t"
" psubusw %%mm0, %%mm1 \n\t"
" paddw %%mm0, %%mm1 \n\t" /* mm1 = max(mm1, mm0) */
" movd %%mm1, %0 \n\t"
" andl $0xffff, %0 \n\t"
: "=m" (MaxSad),
"+r" (Src1),
"+r" (Src2)
:
: "memory"
);
return MaxSad;
}
static ogg_uint32_t col_sad8x8__mmx (unsigned char *Src1, unsigned char *Src2,
ogg_uint32_t stride)
{
ogg_uint32_t MaxSad;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm3, %%mm3 \n\t" /* zero out mm3 for unpack */
" pxor %%mm4, %%mm4 \n\t" /* mm4 low sum */
" pxor %%mm5, %%mm5 \n\t" /* mm5 high sum */
" pxor %%mm6, %%mm6 \n\t" /* mm6 low sum */
" pxor %%mm7, %%mm7 \n\t" /* mm7 high sum */
" mov $4, %%edi \n\t" /* 4 rows */
"1: \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t" /* take 8 bytes */
" movq %%mm0, %%mm2 \n\t"
" psubusb %%mm1, %%mm0 \n\t" /* A - B */
" psubusb %%mm2, %%mm1 \n\t" /* B - A */
" por %%mm1, %%mm0 \n\t" /* and or gives abs difference */
" movq %%mm0, %%mm1 \n\t"
" punpcklbw %%mm3, %%mm0 \n\t" /* unpack to higher precision for accumulation */
" paddw %%mm0, %%mm4 \n\t" /* accumulate difference... */
" punpckhbw %%mm3, %%mm1 \n\t" /* unpack high four bytes to higher precision */
" paddw %%mm1, %%mm5 \n\t" /* accumulate difference... */
" add %3, %1 \n\t" /* Inc pointer into the new data */
" add %3, %2 \n\t" /* Inc pointer into the new data */
" dec %%edi \n\t"
" jnz 1b \n\t"
" mov $4, %%edi \n\t" /* 4 rows */
"2: \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t" /* take 8 bytes */
" movq %%mm0, %%mm2 \n\t"
" psubusb %%mm1, %%mm0 \n\t" /* A - B */
" psubusb %%mm2, %%mm1 \n\t" /* B - A */
" por %%mm1, %%mm0 \n\t" /* and or gives abs difference */
" movq %%mm0, %%mm1 \n\t"
" punpcklbw %%mm3, %%mm0 \n\t" /* unpack to higher precision for accumulation */
" paddw %%mm0, %%mm6 \n\t" /* accumulate difference... */
" punpckhbw %%mm3, %%mm1 \n\t" /* unpack high four bytes to higher precision */
" paddw %%mm1, %%mm7 \n\t" /* accumulate difference... */
" add %3, %1 \n\t" /* Inc pointer into the new data */
" add %3, %2 \n\t" /* Inc pointer into the new data */
" dec %%edi \n\t"
" jnz 2b \n\t"
" psubusw %%mm6, %%mm7 \n\t"
" paddw %%mm6, %%mm7 \n\t" /* mm7 = max(mm7, mm6) */
" psubusw %%mm4, %%mm5 \n\t"
" paddw %%mm4, %%mm5 \n\t" /* mm5 = max(mm5, mm4) */
" psubusw %%mm5, %%mm7 \n\t"
" paddw %%mm5, %%mm7 \n\t" /* mm7 = max(mm5, mm7) */
" movq %%mm7, %%mm6 \n\t"
" psrlq $32, %%mm6 \n\t"
" psubusw %%mm6, %%mm7 \n\t"
" paddw %%mm6, %%mm7 \n\t" /* mm7 = max(mm5, mm7) */
" movq %%mm7, %%mm6 \n\t"
" psrlq $16, %%mm6 \n\t"
" psubusw %%mm6, %%mm7 \n\t"
" paddw %%mm6, %%mm7 \n\t" /* mm7 = max(mm5, mm7) */
" movd %%mm7, %0 \n\t"
" andl $0xffff, %0 \n\t"
: "=r" (MaxSad),
"+r" (Src1),
"+r" (Src2)
: "r" (stride)
: "memory", "edi"
);
return MaxSad;
}
#define SAD_LOOP \
" movq (%1), %%mm0 \n\t" /* take 8 bytes */ \
" movq (%2), %%mm1 \n\t" \
" movq %%mm0, %%mm2 \n\t" \
" psubusb %%mm1, %%mm0 \n\t" /* A - B */ \
" psubusb %%mm2, %%mm1 \n\t" /* B - A */ \
" por %%mm1, %%mm0 \n\t" /* and or gives abs difference */ \
" movq %%mm0, %%mm1 \n\t" \
" punpcklbw %%mm6, %%mm0 \n\t" /* unpack to higher precision for accumulation */ \
" paddw %%mm0, %%mm7 \n\t" /* accumulate difference... */ \
" punpckhbw %%mm6, %%mm1 \n\t" /* unpack high four bytes to higher precision */ \
" add %3, %1 \n\t" /* Inc pointer into the new data */ \
" paddw %%mm1, %%mm7 \n\t" /* accumulate difference... */ \
" add %4, %2 \n\t" /* Inc pointer into ref data */
static ogg_uint32_t sad8x8__mmx (unsigned char *ptr1, ogg_uint32_t stride1,
unsigned char *ptr2, ogg_uint32_t stride2)
{
ogg_uint32_t DiffVal;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm6, %%mm6 \n\t" /* zero out mm6 for unpack */
" pxor %%mm7, %%mm7 \n\t" /* mm7 contains the result */
SAD_LOOP
SAD_LOOP
SAD_LOOP
SAD_LOOP
SAD_LOOP
SAD_LOOP
SAD_LOOP
SAD_LOOP
" movq %%mm7, %%mm0 \n\t"
" psrlq $32, %%mm7 \n\t"
" paddw %%mm0, %%mm7 \n\t"
" movq %%mm7, %%mm0 \n\t"
" psrlq $16, %%mm7 \n\t"
" paddw %%mm0, %%mm7 \n\t"
" movd %%mm7, %0 \n\t"
" andl $0xffff, %0 \n\t"
: "=m" (DiffVal),
"+r" (ptr1),
"+r" (ptr2)
: "r" (stride1),
"r" (stride2)
: "memory"
);
return DiffVal;
}
static ogg_uint32_t sad8x8_thres__mmx (unsigned char *ptr1, ogg_uint32_t stride1,
unsigned char *ptr2, ogg_uint32_t stride2,
ogg_uint32_t thres)
{
return sad8x8__mmx (ptr1, stride1, ptr2, stride2);
}
static ogg_uint32_t sad8x8_xy2_thres__mmx (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr1,
unsigned char *RefDataPtr2, ogg_uint32_t RefStride,
ogg_uint32_t thres)
{
ogg_uint32_t DiffVal;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pcmpeqd %%mm5, %%mm5 \n\t" /* fefefefefefefefe in mm5 */
" paddb %%mm5, %%mm5 \n\t"
" pxor %%mm6, %%mm6 \n\t" /* zero out mm6 for unpack */
" pxor %%mm7, %%mm7 \n\t" /* mm7 contains the result */
" mov $8, %%edi \n\t" /* 8 rows */
"1: \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm2 \n\t"
" movq (%3), %%mm3 \n\t" /* take average of mm2 and mm3 */
" movq %%mm2, %%mm1 \n\t"
" pand %%mm3, %%mm1 \n\t"
" pxor %%mm2, %%mm3 \n\t"
" pand %%mm5, %%mm3 \n\t"
" psrlq $1, %%mm3 \n\t"
" paddb %%mm3, %%mm1 \n\t"
" movq %%mm0, %%mm2 \n\t"
" psubusb %%mm1, %%mm0 \n\t" /* A - B */
" psubusb %%mm2, %%mm1 \n\t" /* B - A */
" por %%mm1, %%mm0 \n\t" /* and or gives abs difference */
" movq %%mm0, %%mm1 \n\t"
" punpcklbw %%mm6, %%mm0 \n\t" /* unpack to higher precision for accumulation */
" paddw %%mm0, %%mm7 \n\t" /* accumulate difference... */
" punpckhbw %%mm6, %%mm1 \n\t" /* unpack high four bytes to higher precision */
" add %4, %1 \n\t" /* Inc pointer into the new data */
" paddw %%mm1, %%mm7 \n\t" /* accumulate difference... */
" add %5, %2 \n\t" /* Inc pointer into ref data */
" add %5, %3 \n\t" /* Inc pointer into ref data */
" dec %%edi \n\t"
" jnz 1b \n\t"
" movq %%mm7, %%mm0 \n\t"
" psrlq $32, %%mm7 \n\t"
" paddw %%mm0, %%mm7 \n\t"
" movq %%mm7, %%mm0 \n\t"
" psrlq $16, %%mm7 \n\t"
" paddw %%mm0, %%mm7 \n\t"
" movd %%mm7, %0 \n\t"
" andl $0xffff, %0 \n\t"
: "=m" (DiffVal),
"+r" (SrcData),
"+r" (RefDataPtr1),
"+r" (RefDataPtr2)
: "m" (SrcStride),
"m" (RefStride)
: "edi", "memory"
);
return DiffVal;
}
static ogg_uint32_t intra8x8_err__mmx (unsigned char *DataPtr, ogg_uint32_t Stride)
{
ogg_uint32_t XSum;
ogg_uint32_t XXSum;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm5, %%mm5 \n\t"
" pxor %%mm6, %%mm6 \n\t"
" pxor %%mm7, %%mm7 \n\t"
" mov $8, %%edi \n\t"
"1: \n\t"
" movq (%2), %%mm0 \n\t" /* take 8 bytes */
" movq %%mm0, %%mm2 \n\t"
" punpcklbw %%mm6, %%mm0 \n\t"
" punpckhbw %%mm6, %%mm2 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" paddw %%mm2, %%mm5 \n\t"
" pmaddwd %%mm0, %%mm0 \n\t"
" pmaddwd %%mm2, %%mm2 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" paddd %%mm2, %%mm7 \n\t"
" add %3, %2 \n\t" /* Inc pointer into src data */
" dec %%edi \n\t"
" jnz 1b \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $32, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $16, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movd %%mm5, %%edi \n\t"
" movsx %%di, %%edi \n\t"
" movl %%edi, %0 \n\t"
" movq %%mm7, %%mm0 \n\t"
" psrlq $32, %%mm7 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" movd %%mm7, %1 \n\t"
: "=r" (XSum),
"=r" (XXSum),
"+r" (DataPtr)
: "r" (Stride)
: "edi", "memory"
);
/* Compute population variance as mis-match metric. */
return (( (XXSum<<6) - XSum*XSum ) );
}
static ogg_uint32_t inter8x8_err__mmx (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr, ogg_uint32_t RefStride)
{
ogg_uint32_t XSum;
ogg_uint32_t XXSum;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm5, %%mm5 \n\t"
" pxor %%mm6, %%mm6 \n\t"
" pxor %%mm7, %%mm7 \n\t"
" mov $8, %%edi \n\t"
"1: \n\t"
" movq (%2), %%mm0 \n\t" /* take 8 bytes */
" movq (%3), %%mm1 \n\t"
" movq %%mm0, %%mm2 \n\t"
" movq %%mm1, %%mm3 \n\t"
" punpcklbw %%mm6, %%mm0 \n\t"
" punpcklbw %%mm6, %%mm1 \n\t"
" punpckhbw %%mm6, %%mm2 \n\t"
" punpckhbw %%mm6, %%mm3 \n\t"
" psubsw %%mm1, %%mm0 \n\t"
" psubsw %%mm3, %%mm2 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" paddw %%mm2, %%mm5 \n\t"
" pmaddwd %%mm0, %%mm0 \n\t"
" pmaddwd %%mm2, %%mm2 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" paddd %%mm2, %%mm7 \n\t"
" add %4, %2 \n\t" /* Inc pointer into src data */
" add %5, %3 \n\t" /* Inc pointer into ref data */
" dec %%edi \n\t"
" jnz 1b \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $32, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $16, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movd %%mm5, %%edi \n\t"
" movsx %%di, %%edi \n\t"
" movl %%edi, %0 \n\t"
" movq %%mm7, %%mm0 \n\t"
" psrlq $32, %%mm7 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" movd %%mm7, %1 \n\t"
: "=m" (XSum),
"=m" (XXSum),
"+r" (SrcData),
"+r" (RefDataPtr)
: "m" (SrcStride),
"m" (RefStride)
: "edi", "memory"
);
/* Compute and return population variance as mis-match metric. */
return (( (XXSum<<6) - XSum*XSum ));
}
static ogg_uint32_t inter8x8_err_xy2__mmx (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr1,
unsigned char *RefDataPtr2, ogg_uint32_t RefStride)
{
ogg_uint32_t XSum;
ogg_uint32_t XXSum;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pcmpeqd %%mm4, %%mm4 \n\t" /* fefefefefefefefe in mm4 */
" paddb %%mm4, %%mm4 \n\t"
" pxor %%mm5, %%mm5 \n\t"
" pxor %%mm6, %%mm6 \n\t"
" pxor %%mm7, %%mm7 \n\t"
" mov $8, %%edi \n\t"
"1: \n\t"
" movq (%2), %%mm0 \n\t" /* take 8 bytes */
" movq (%3), %%mm2 \n\t"
" movq (%4), %%mm3 \n\t" /* take average of mm2 and mm3 */
" movq %%mm2, %%mm1 \n\t"
" pand %%mm3, %%mm1 \n\t"
" pxor %%mm2, %%mm3 \n\t"
" pand %%mm4, %%mm3 \n\t"
" psrlq $1, %%mm3 \n\t"
" paddb %%mm3, %%mm1 \n\t"
" movq %%mm0, %%mm2 \n\t"
" movq %%mm1, %%mm3 \n\t"
" punpcklbw %%mm6, %%mm0 \n\t"
" punpcklbw %%mm6, %%mm1 \n\t"
" punpckhbw %%mm6, %%mm2 \n\t"
" punpckhbw %%mm6, %%mm3 \n\t"
" psubsw %%mm1, %%mm0 \n\t"
" psubsw %%mm3, %%mm2 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" paddw %%mm2, %%mm5 \n\t"
" pmaddwd %%mm0, %%mm0 \n\t"
" pmaddwd %%mm2, %%mm2 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" paddd %%mm2, %%mm7 \n\t"
" add %5, %2 \n\t" /* Inc pointer into src data */
" add %6, %3 \n\t" /* Inc pointer into ref data */
" add %6, %4 \n\t" /* Inc pointer into ref data */
" dec %%edi \n\t"
" jnz 1b \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $32, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $16, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movd %%mm5, %%edi \n\t"
" movsx %%di, %%edi \n\t"
" movl %%edi, %0 \n\t"
" movq %%mm7, %%mm0 \n\t"
" psrlq $32, %%mm7 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" movd %%mm7, %1 \n\t"
: "=m" (XSum),
"=m" (XXSum),
"+r" (SrcData),
"+r" (RefDataPtr1),
"+r" (RefDataPtr2)
: "m" (SrcStride),
"m" (RefStride)
: "edi", "memory"
);
/* Compute and return population variance as mis-match metric. */
return (( (XXSum<<6) - XSum*XSum ));
}
static void restore_fpu (void)
{
__asm__ __volatile__ (
" emms \n\t"
);
}
void dsp_mmx_init(DspFunctions *funcs)
{
funcs->restore_fpu = restore_fpu;
funcs->sub8x8 = sub8x8__mmx;
funcs->sub8x8_128 = sub8x8_128__mmx;
funcs->sub8x8avg2 = sub8x8avg2__mmx;
funcs->row_sad8 = row_sad8__mmx;
funcs->col_sad8x8 = col_sad8x8__mmx;
funcs->sad8x8 = sad8x8__mmx;
funcs->sad8x8_thres = sad8x8_thres__mmx;
funcs->sad8x8_xy2_thres = sad8x8_xy2_thres__mmx;
funcs->intra8x8_err = intra8x8_err__mmx;
funcs->inter8x8_err = inter8x8_err__mmx;
funcs->inter8x8_err_xy2 = inter8x8_err_xy2__mmx;
}
#endif /* USE_ASM */

View file

@ -1,347 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dsp_mmxext.c 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
#include <stdlib.h>
#include "../codec_internal.h"
#include "../dsp.h"
#if defined(USE_ASM)
#define SAD_MMXEXT_LOOP \
" movq (%1), %%mm0 \n\t" /* take 8 bytes */ \
" movq (%2), %%mm1 \n\t" \
" psadbw %%mm1, %%mm0 \n\t" \
" add %3, %1 \n\t" /* Inc pointer into the new data */ \
" paddw %%mm0, %%mm7 \n\t" /* accumulate difference... */ \
" add %4, %2 \n\t" /* Inc pointer into ref data */
static ogg_uint32_t sad8x8__mmxext (unsigned char *ptr1, ogg_uint32_t stride1,
unsigned char *ptr2, ogg_uint32_t stride2)
{
ogg_uint32_t DiffVal;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm7, %%mm7 \n\t" /* mm7 contains the result */
SAD_MMXEXT_LOOP
SAD_MMXEXT_LOOP
SAD_MMXEXT_LOOP
SAD_MMXEXT_LOOP
SAD_MMXEXT_LOOP
SAD_MMXEXT_LOOP
SAD_MMXEXT_LOOP
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t"
" psadbw %%mm1, %%mm0 \n\t"
" paddw %%mm0, %%mm7 \n\t" /* accumulate difference... */
" movd %%mm7, %0 \n\t"
: "=r" (DiffVal),
"+r" (ptr1),
"+r" (ptr2)
: "r" (stride1),
"r" (stride2)
: "memory"
);
return DiffVal;
}
#define SAD_TRES_LOOP \
" movq (%1), %%mm0 \n\t" /* take 8 bytes */ \
" movq (%2), %%mm1 \n\t" \
" psadbw %%mm1, %%mm0 \n\t" \
" add %3, %1 \n\t" /* Inc pointer into the new data */ \
" paddw %%mm0, %%mm7 \n\t" /* accumulate difference... */ \
" add %4, %2 \n\t" /* Inc pointer into ref data */
static ogg_uint32_t sad8x8_thres__mmxext (unsigned char *ptr1, ogg_uint32_t stride1,
unsigned char *ptr2, ogg_uint32_t stride2,
ogg_uint32_t thres)
{
ogg_uint32_t DiffVal;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm7, %%mm7 \n\t" /* mm7 contains the result */
SAD_TRES_LOOP
SAD_TRES_LOOP
SAD_TRES_LOOP
SAD_TRES_LOOP
SAD_TRES_LOOP
SAD_TRES_LOOP
SAD_TRES_LOOP
SAD_TRES_LOOP
" movd %%mm7, %0 \n\t"
: "=r" (DiffVal),
"+r" (ptr1),
"+r" (ptr2)
: "r" (stride1),
"r" (stride2)
: "memory"
);
return DiffVal;
}
#define SAD_XY2_TRES \
" movq (%1), %%mm0 \n\t" /* take 8 bytes */ \
" movq (%2), %%mm1 \n\t" \
" movq (%3), %%mm2 \n\t" \
" pavgb %%mm2, %%mm1 \n\t" \
" psadbw %%mm1, %%mm0 \n\t" \
\
" add %4, %1 \n\t" /* Inc pointer into the new data */ \
" paddw %%mm0, %%mm7 \n\t" /* accumulate difference... */ \
" add %5, %2 \n\t" /* Inc pointer into ref data */ \
" add %5, %3 \n\t" /* Inc pointer into ref data */
static ogg_uint32_t sad8x8_xy2_thres__mmxext (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr1,
unsigned char *RefDataPtr2, ogg_uint32_t RefStride,
ogg_uint32_t thres)
{
ogg_uint32_t DiffVal;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm7, %%mm7 \n\t" /* mm7 contains the result */
SAD_XY2_TRES
SAD_XY2_TRES
SAD_XY2_TRES
SAD_XY2_TRES
SAD_XY2_TRES
SAD_XY2_TRES
SAD_XY2_TRES
SAD_XY2_TRES
" movd %%mm7, %0 \n\t"
: "=m" (DiffVal),
"+r" (SrcData),
"+r" (RefDataPtr1),
"+r" (RefDataPtr2)
: "m" (SrcStride),
"m" (RefStride)
: "memory"
);
return DiffVal;
}
static ogg_uint32_t row_sad8__mmxext (unsigned char *Src1, unsigned char *Src2)
{
ogg_uint32_t MaxSad;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" movd (%1), %%mm0 \n\t"
" movd (%2), %%mm1 \n\t"
" psadbw %%mm0, %%mm1 \n\t"
" movd 4(%1), %%mm2 \n\t"
" movd 4(%2), %%mm3 \n\t"
" psadbw %%mm2, %%mm3 \n\t"
" pmaxsw %%mm1, %%mm3 \n\t"
" movd %%mm3, %0 \n\t"
" andl $0xffff, %0 \n\t"
: "=m" (MaxSad),
"+r" (Src1),
"+r" (Src2)
:
: "memory"
);
return MaxSad;
}
static ogg_uint32_t col_sad8x8__mmxext (unsigned char *Src1, unsigned char *Src2,
ogg_uint32_t stride)
{
ogg_uint32_t MaxSad;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm3, %%mm3 \n\t" /* zero out mm3 for unpack */
" pxor %%mm4, %%mm4 \n\t" /* mm4 low sum */
" pxor %%mm5, %%mm5 \n\t" /* mm5 high sum */
" pxor %%mm6, %%mm6 \n\t" /* mm6 low sum */
" pxor %%mm7, %%mm7 \n\t" /* mm7 high sum */
" mov $4, %%edi \n\t" /* 4 rows */
"1: \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t" /* take 8 bytes */
" movq %%mm0, %%mm2 \n\t"
" psubusb %%mm1, %%mm0 \n\t" /* A - B */
" psubusb %%mm2, %%mm1 \n\t" /* B - A */
" por %%mm1, %%mm0 \n\t" /* and or gives abs difference */
" movq %%mm0, %%mm1 \n\t"
" punpcklbw %%mm3, %%mm0 \n\t" /* unpack to higher precision for accumulation */
" paddw %%mm0, %%mm4 \n\t" /* accumulate difference... */
" punpckhbw %%mm3, %%mm1 \n\t" /* unpack high four bytes to higher precision */
" paddw %%mm1, %%mm5 \n\t" /* accumulate difference... */
" add %3, %1 \n\t" /* Inc pointer into the new data */
" add %3, %2 \n\t" /* Inc pointer into the new data */
" dec %%edi \n\t"
" jnz 1b \n\t"
" mov $4, %%edi \n\t" /* 4 rows */
"2: \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t" /* take 8 bytes */
" movq %%mm0, %%mm2 \n\t"
" psubusb %%mm1, %%mm0 \n\t" /* A - B */
" psubusb %%mm2, %%mm1 \n\t" /* B - A */
" por %%mm1, %%mm0 \n\t" /* and or gives abs difference */
" movq %%mm0, %%mm1 \n\t"
" punpcklbw %%mm3, %%mm0 \n\t" /* unpack to higher precision for accumulation */
" paddw %%mm0, %%mm6 \n\t" /* accumulate difference... */
" punpckhbw %%mm3, %%mm1 \n\t" /* unpack high four bytes to higher precision */
" paddw %%mm1, %%mm7 \n\t" /* accumulate difference... */
" add %3, %1 \n\t" /* Inc pointer into the new data */
" add %3, %2 \n\t" /* Inc pointer into the new data */
" dec %%edi \n\t"
" jnz 2b \n\t"
" pmaxsw %%mm6, %%mm7 \n\t"
" pmaxsw %%mm4, %%mm5 \n\t"
" pmaxsw %%mm5, %%mm7 \n\t"
" movq %%mm7, %%mm6 \n\t"
" psrlq $32, %%mm6 \n\t"
" pmaxsw %%mm6, %%mm7 \n\t"
" movq %%mm7, %%mm6 \n\t"
" psrlq $16, %%mm6 \n\t"
" pmaxsw %%mm6, %%mm7 \n\t"
" movd %%mm7, %0 \n\t"
" andl $0xffff, %0 \n\t"
: "=r" (MaxSad),
"+r" (Src1),
"+r" (Src2)
: "r" (stride)
: "memory", "edi"
);
return MaxSad;
}
static ogg_uint32_t inter8x8_err_xy2__mmxext (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr1,
unsigned char *RefDataPtr2, ogg_uint32_t RefStride)
{
ogg_uint32_t XSum;
ogg_uint32_t XXSum;
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm4, %%mm4 \n\t"
" pxor %%mm5, %%mm5 \n\t"
" pxor %%mm6, %%mm6 \n\t"
" pxor %%mm7, %%mm7 \n\t"
" mov $8, %%edi \n\t"
"1: \n\t"
" movq (%2), %%mm0 \n\t" /* take 8 bytes */
" movq (%3), %%mm2 \n\t"
" movq (%4), %%mm1 \n\t" /* take average of mm2 and mm1 */
" pavgb %%mm2, %%mm1 \n\t"
" movq %%mm0, %%mm2 \n\t"
" movq %%mm1, %%mm3 \n\t"
" punpcklbw %%mm6, %%mm0 \n\t"
" punpcklbw %%mm4, %%mm1 \n\t"
" punpckhbw %%mm6, %%mm2 \n\t"
" punpckhbw %%mm4, %%mm3 \n\t"
" psubsw %%mm1, %%mm0 \n\t"
" psubsw %%mm3, %%mm2 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" paddw %%mm2, %%mm5 \n\t"
" pmaddwd %%mm0, %%mm0 \n\t"
" pmaddwd %%mm2, %%mm2 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" paddd %%mm2, %%mm7 \n\t"
" add %5, %2 \n\t" /* Inc pointer into src data */
" add %6, %3 \n\t" /* Inc pointer into ref data */
" add %6, %4 \n\t" /* Inc pointer into ref data */
" dec %%edi \n\t"
" jnz 1b \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $32, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $16, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movd %%mm5, %%edi \n\t"
" movsx %%di, %%edi \n\t"
" movl %%edi, %0 \n\t"
" movq %%mm7, %%mm0 \n\t"
" psrlq $32, %%mm7 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" movd %%mm7, %1 \n\t"
: "=m" (XSum),
"=m" (XXSum),
"+r" (SrcData),
"+r" (RefDataPtr1),
"+r" (RefDataPtr2)
: "m" (SrcStride),
"m" (RefStride)
: "edi", "memory"
);
/* Compute and return population variance as mis-match metric. */
return (( (XXSum<<6) - XSum*XSum ));
}
void dsp_mmxext_init(DspFunctions *funcs)
{
funcs->row_sad8 = row_sad8__mmxext;
funcs->col_sad8x8 = col_sad8x8__mmxext;
funcs->sad8x8 = sad8x8__mmxext;
funcs->sad8x8_thres = sad8x8_thres__mmxext;
funcs->sad8x8_xy2_thres = sad8x8_xy2_thres__mmxext;
funcs->inter8x8_err_xy2 = inter8x8_err_xy2__mmxext;
}
#endif /* USE_ASM */

View file

@ -1,339 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: fdct_mmx.c 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
/* mmx fdct implementation */
#include "theora/theora.h"
#include "../codec_internal.h"
#include "../dsp.h"
#if defined(USE_ASM)
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC1S7 = 0x0fb15fb15fb15fb15LL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC2S6 = 0x0ec83ec83ec83ec83LL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC3S5 = 0x0d4dbd4dbd4dbd4dbLL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC4S4 = 0x0b505b505b505b505LL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC5S3 = 0x08e3a8e3a8e3a8e3aLL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC6S2 = 0x061f861f861f861f8LL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC7S1 = 0x031f131f131f131f1LL;
/* execute stage 1 of forward DCT */
#define Fdct_mmx(ip0,ip1,ip2,ip3,ip4,ip5,ip6,ip7,temp) \
" movq " #ip0 ", %%mm0 \n\t" \
" movq " #ip1 ", %%mm1 \n\t" \
" movq " #ip3 ", %%mm2 \n\t" \
" movq " #ip5 ", %%mm3 \n\t" \
" movq %%mm0, %%mm4 \n\t" \
" movq %%mm1, %%mm5 \n\t" \
" movq %%mm2, %%mm6 \n\t" \
" movq %%mm3, %%mm7 \n\t" \
\
" paddsw " #ip7 ", %%mm0 \n\t" /* mm0 = ip0 + ip7 = is07 */ \
" paddsw " #ip2 ", %%mm1 \n\t" /* mm1 = ip1 + ip2 = is12 */ \
" paddsw " #ip4 ", %%mm2 \n\t" /* mm2 = ip3 + ip4 = is34 */ \
" paddsw " #ip6 ", %%mm3 \n\t" /* mm3 = ip5 + ip6 = is56 */ \
" psubsw " #ip7 ", %%mm4 \n\t" /* mm4 = ip0 - ip7 = id07 */ \
" psubsw " #ip2 ", %%mm5 \n\t" /* mm5 = ip1 - ip2 = id12 */ \
\
" psubsw %%mm2, %%mm0 \n\t" /* mm0 = is07 - is34 */ \
\
" paddsw %%mm2, %%mm2 \n\t" \
\
" psubsw " #ip4 ", %%mm6 \n\t" /* mm6 = ip3 - ip4 = id34 */ \
\
" paddsw %%mm0, %%mm2 \n\t" /* mm2 = is07 + is34 = is0734 */ \
" psubsw %%mm3, %%mm1 \n\t" /* mm1 = is12 - is56 */ \
" movq %%mm0," #temp " \n\t" /* Save is07 - is34 to free mm0; */ \
" paddsw %%mm3, %%mm3 \n\t" \
" paddsw %%mm1, %%mm3 \n\t" /* mm3 = is12 + 1s56 = is1256 */ \
\
" psubsw " #ip6 ", %%mm7 \n\t" /* mm7 = ip5 - ip6 = id56 */ \
/* ------------------------------------------------------------------- */ \
" psubsw %%mm7, %%mm5 \n\t" /* mm5 = id12 - id56 */ \
" paddsw %%mm7, %%mm7 \n\t" \
" paddsw %%mm5, %%mm7 \n\t" /* mm7 = id12 + id56 */ \
/* ------------------------------------------------------------------- */ \
" psubsw %%mm3, %%mm2 \n\t" /* mm2 = is0734 - is1256 */ \
" paddsw %%mm3, %%mm3 \n\t" \
\
" movq %%mm2, %%mm0 \n\t" /* make a copy */ \
" paddsw %%mm2, %%mm3 \n\t" /* mm3 = is0734 + is1256 */ \
\
" pmulhw %[xC4S4], %%mm0 \n\t" /* mm0 = xC4S4 * ( is0734 - is1256 ) - ( is0734 - is1256 ) */ \
" paddw %%mm2, %%mm0 \n\t" /* mm0 = xC4S4 * ( is0734 - is1256 ) */ \
" psrlw $15, %%mm2 \n\t" \
" paddw %%mm2, %%mm0 \n\t" /* Truncate mm0, now it is op[4] */ \
\
" movq %%mm3, %%mm2 \n\t" \
" movq %%mm0," #ip4 " \n\t" /* save ip4, now mm0,mm2 are free */ \
\
" movq %%mm3, %%mm0 \n\t" \
" pmulhw %[xC4S4], %%mm3 \n\t" /* mm3 = xC4S4 * ( is0734 +is1256 ) - ( is0734 +is1256 ) */ \
\
" psrlw $15, %%mm2 \n\t" \
" paddw %%mm0, %%mm3 \n\t" /* mm3 = xC4S4 * ( is0734 +is1256 ) */ \
" paddw %%mm2, %%mm3 \n\t" /* Truncate mm3, now it is op[0] */ \
\
" movq %%mm3," #ip0 " \n\t" \
/* ------------------------------------------------------------------- */ \
" movq " #temp ", %%mm3 \n\t" /* mm3 = irot_input_y */ \
" pmulhw %[xC2S6], %%mm3 \n\t" /* mm3 = xC2S6 * irot_input_y - irot_input_y */ \
\
" movq " #temp ", %%mm2 \n\t" \
" movq %%mm2, %%mm0 \n\t" \
\
" psrlw $15, %%mm2 \n\t" /* mm3 = xC2S6 * irot_input_y */ \
" paddw %%mm0, %%mm3 \n\t" \
\
" paddw %%mm2, %%mm3 \n\t" /* Truncated */ \
" movq %%mm5, %%mm0 \n\t" \
\
" movq %%mm5, %%mm2 \n\t" \
" pmulhw %[xC6S2], %%mm0 \n\t" /* mm0 = xC6S2 * irot_input_x */ \
\
" psrlw $15, %%mm2 \n\t" \
" paddw %%mm2, %%mm0 \n\t" /* Truncated */ \
\
" paddsw %%mm0, %%mm3 \n\t" /* ip[2] */ \
" movq %%mm3," #ip2 " \n\t" /* Save ip2 */ \
\
" movq %%mm5, %%mm0 \n\t" \
" movq %%mm5, %%mm2 \n\t" \
\
" pmulhw %[xC2S6], %%mm5 \n\t" /* mm5 = xC2S6 * irot_input_x - irot_input_x */ \
" psrlw $15, %%mm2 \n\t" \
\
" movq " #temp ", %%mm3 \n\t" \
" paddw %%mm0, %%mm5 \n\t" /* mm5 = xC2S6 * irot_input_x */ \
\
" paddw %%mm2, %%mm5 \n\t" /* Truncated */ \
" movq %%mm3, %%mm2 \n\t" \
\
" pmulhw %[xC6S2], %%mm3 \n\t" /* mm3 = xC6S2 * irot_input_y */ \
" psrlw $15, %%mm2 \n\t" \
\
" paddw %%mm2, %%mm3 \n\t" /* Truncated */ \
" psubsw %%mm5, %%mm3 \n\t" \
\
" movq %%mm3," #ip6 " \n\t" \
/* ------------------------------------------------------------------- */ \
" movq %[xC4S4], %%mm0 \n\t" \
" movq %%mm1, %%mm2 \n\t" \
" movq %%mm1, %%mm3 \n\t" \
\
" pmulhw %%mm0, %%mm1 \n\t" /* mm0 = xC4S4 * ( is12 - is56 ) - ( is12 - is56 ) */ \
" psrlw $15, %%mm2 \n\t" \
\
" paddw %%mm3, %%mm1 \n\t" /* mm0 = xC4S4 * ( is12 - is56 ) */ \
" paddw %%mm2, %%mm1 \n\t" /* Truncate mm1, now it is icommon_product1 */ \
\
" movq %%mm7, %%mm2 \n\t" \
" movq %%mm7, %%mm3 \n\t" \
\
" pmulhw %%mm0, %%mm7 \n\t" /* mm7 = xC4S4 * ( id12 + id56 ) - ( id12 + id56 ) */ \
" psrlw $15, %%mm2 \n\t" \
\
" paddw %%mm3, %%mm7 \n\t" /* mm7 = xC4S4 * ( id12 + id56 ) */ \
" paddw %%mm2, %%mm7 \n\t" /* Truncate mm7, now it is icommon_product2 */ \
/* ------------------------------------------------------------------- */ \
" pxor %%mm0, %%mm0 \n\t" /* Clear mm0 */ \
" psubsw %%mm6, %%mm0 \n\t" /* mm0 = - id34 */ \
\
" psubsw %%mm7, %%mm0 \n\t" /* mm0 = - ( id34 + idcommon_product2 ) */ \
" paddsw %%mm6, %%mm6 \n\t" \
" paddsw %%mm0, %%mm6 \n\t" /* mm6 = id34 - icommon_product2 */ \
\
" psubsw %%mm1, %%mm4 \n\t" /* mm4 = id07 - icommon_product1 */ \
" paddsw %%mm1, %%mm1 \n\t" \
" paddsw %%mm4, %%mm1 \n\t" /* mm1 = id07 + icommon_product1 */ \
/* ------------------------------------------------------------------- */ \
" movq %[xC1S7], %%mm7 \n\t" \
" movq %%mm1, %%mm2 \n\t" \
\
" movq %%mm1, %%mm3 \n\t" \
" pmulhw %%mm7, %%mm1 \n\t" /* mm1 = xC1S7 * irot_input_x - irot_input_x */ \
\
" movq %[xC7S1], %%mm7 \n\t" \
" psrlw $15, %%mm2 \n\t" \
\
" paddw %%mm3, %%mm1 \n\t" /* mm1 = xC1S7 * irot_input_x */ \
" paddw %%mm2, %%mm1 \n\t" /* Trucated */ \
\
" pmulhw %%mm7, %%mm3 \n\t" /* mm3 = xC7S1 * irot_input_x */ \
" paddw %%mm2, %%mm3 \n\t" /* Truncated */ \
\
" movq %%mm0, %%mm5 \n\t" \
" movq %%mm0, %%mm2 \n\t" \
\
" movq %[xC1S7], %%mm7 \n\t" \
" pmulhw %%mm7, %%mm0 \n\t" /* mm0 = xC1S7 * irot_input_y - irot_input_y */ \
\
" movq %[xC7S1], %%mm7 \n\t" \
" psrlw $15, %%mm2 \n\t" \
\
" paddw %%mm5, %%mm0 \n\t" /* mm0 = xC1S7 * irot_input_y */ \
" paddw %%mm2, %%mm0 \n\t" /* Truncated */ \
\
" pmulhw %%mm7, %%mm5 \n\t" /* mm5 = xC7S1 * irot_input_y */ \
" paddw %%mm2, %%mm5 \n\t" /* Truncated */ \
\
" psubsw %%mm5, %%mm1 \n\t" /* mm1 = xC1S7 * irot_input_x - xC7S1 * irot_input_y = ip1 */ \
" paddsw %%mm0, %%mm3 \n\t" /* mm3 = xC7S1 * irot_input_x - xC1S7 * irot_input_y = ip7 */ \
\
" movq %%mm1," #ip1 " \n\t" \
" movq %%mm3," #ip7 " \n\t" \
/* ------------------------------------------------------------------- */ \
" movq %[xC3S5], %%mm0 \n\t" \
" movq %[xC5S3], %%mm1 \n\t" \
\
" movq %%mm6, %%mm5 \n\t" \
" movq %%mm6, %%mm7 \n\t" \
\
" movq %%mm4, %%mm2 \n\t" \
" movq %%mm4, %%mm3 \n\t" \
\
" pmulhw %%mm0, %%mm4 \n\t" /* mm4 = xC3S5 * irot_input_x - irot_input_x */ \
" pmulhw %%mm1, %%mm6 \n\t" /* mm6 = xC5S3 * irot_input_y - irot_input_y */ \
\
" psrlw $15, %%mm2 \n\t" \
" psrlw $15, %%mm5 \n\t" \
\
" paddw %%mm3, %%mm4 \n\t" /* mm4 = xC3S5 * irot_input_x */ \
" paddw %%mm7, %%mm6 \n\t" /* mm6 = xC5S3 * irot_input_y */ \
\
" paddw %%mm2, %%mm4 \n\t" /* Truncated */ \
" paddw %%mm5, %%mm6 \n\t" /* Truncated */ \
\
" psubsw %%mm6, %%mm4 \n\t" /* ip3 */ \
" movq %%mm4," #ip3 " \n\t" \
\
" movq %%mm3, %%mm4 \n\t" \
" movq %%mm7, %%mm6 \n\t" \
\
" pmulhw %%mm1, %%mm3 \n\t" /* mm3 = xC5S3 * irot_input_x - irot_input_x */ \
" pmulhw %%mm0, %%mm7 \n\t" /* mm7 = xC3S5 * irot_input_y - irot_input_y */ \
\
" paddw %%mm2, %%mm4 \n\t" \
" paddw %%mm5, %%mm6 \n\t" \
\
" paddw %%mm4, %%mm3 \n\t" /* mm3 = xC5S3 * irot_input_x */ \
" paddw %%mm6, %%mm7 \n\t" /* mm7 = xC3S5 * irot_input_y */ \
\
" paddw %%mm7, %%mm3 \n\t" /* ip5 */ \
" movq %%mm3," #ip5 " \n\t"
#define Transpose_mmx(ip0,ip1,ip2,ip3,ip4,ip5,ip6,ip7, \
op0,op1,op2,op3,op4,op5,op6,op7) \
" movq " #ip0 ", %%mm0 \n\t" /* mm0 = a0 a1 a2 a3 */ \
" movq " #ip4 ", %%mm4 \n\t" /* mm4 = e4 e5 e6 e7 */ \
" movq " #ip1 ", %%mm1 \n\t" /* mm1 = b0 b1 b2 b3 */ \
" movq " #ip5 ", %%mm5 \n\t" /* mm5 = f4 f5 f6 f7 */ \
" movq " #ip2 ", %%mm2 \n\t" /* mm2 = c0 c1 c2 c3 */ \
" movq " #ip6 ", %%mm6 \n\t" /* mm6 = g4 g5 g6 g7 */ \
" movq " #ip3 ", %%mm3 \n\t" /* mm3 = d0 d1 d2 d3 */ \
" movq %%mm1," #op1 " \n\t" /* save b0 b1 b2 b3 */ \
" movq " #ip7 ", %%mm7 \n\t" /* mm7 = h0 h1 h2 h3 */ \
/* Transpose 2x8 block */ \
" movq %%mm4, %%mm1 \n\t" /* mm1 = e3 e2 e1 e0 */ \
" punpcklwd %%mm5, %%mm4 \n\t" /* mm4 = f1 e1 f0 e0 */ \
" movq %%mm0," #op0 " \n\t" /* save a3 a2 a1 a0 */ \
" punpckhwd %%mm5, %%mm1 \n\t" /* mm1 = f3 e3 f2 e2 */ \
" movq %%mm6, %%mm0 \n\t" /* mm0 = g3 g2 g1 g0 */ \
" punpcklwd %%mm7, %%mm6 \n\t" /* mm6 = h1 g1 h0 g0 */ \
" movq %%mm4, %%mm5 \n\t" /* mm5 = f1 e1 f0 e0 */ \
" punpckldq %%mm6, %%mm4 \n\t" /* mm4 = h0 g0 f0 e0 = MM4 */ \
" punpckhdq %%mm6, %%mm5 \n\t" /* mm5 = h1 g1 f1 e1 = MM5 */ \
" movq %%mm1, %%mm6 \n\t" /* mm6 = f3 e3 f2 e2 */ \
" movq %%mm4," #op4 " \n\t" \
" punpckhwd %%mm7, %%mm0 \n\t" /* mm0 = h3 g3 h2 g2 */ \
" movq %%mm5," #op5 " \n\t" \
" punpckhdq %%mm0, %%mm6 \n\t" /* mm6 = h3 g3 f3 e3 = MM7 */ \
" movq " #op0 ", %%mm4 \n\t" /* mm4 = a3 a2 a1 a0 */ \
" punpckldq %%mm0, %%mm1 \n\t" /* mm1 = h2 g2 f2 e2 = MM6 */ \
" movq " #op1 ", %%mm5 \n\t" /* mm5 = b3 b2 b1 b0 */ \
" movq %%mm4, %%mm0 \n\t" /* mm0 = a3 a2 a1 a0 */ \
" movq %%mm6," #op7 " \n\t" \
" punpcklwd %%mm5, %%mm0 \n\t" /* mm0 = b1 a1 b0 a0 */ \
" movq %%mm1," #op6 " \n\t" \
" punpckhwd %%mm5, %%mm4 \n\t" /* mm4 = b3 a3 b2 a2 */ \
" movq %%mm2, %%mm5 \n\t" /* mm5 = c3 c2 c1 c0 */ \
" punpcklwd %%mm3, %%mm2 \n\t" /* mm2 = d1 c1 d0 c0 */ \
" movq %%mm0, %%mm1 \n\t" /* mm1 = b1 a1 b0 a0 */ \
" punpckldq %%mm2, %%mm0 \n\t" /* mm0 = d0 c0 b0 a0 = MM0 */ \
" punpckhdq %%mm2, %%mm1 \n\t" /* mm1 = d1 c1 b1 a1 = MM1 */ \
" movq %%mm4, %%mm2 \n\t" /* mm2 = b3 a3 b2 a2 */ \
" movq %%mm0," #op0 " \n\t" \
" punpckhwd %%mm3, %%mm5 \n\t" /* mm5 = d3 c3 d2 c2 */ \
" movq %%mm1," #op1 " \n\t" \
" punpckhdq %%mm5, %%mm4 \n\t" /* mm4 = d3 c3 b3 a3 = MM3 */ \
" punpckldq %%mm5, %%mm2 \n\t" /* mm2 = d2 c2 b2 a2 = MM2 */ \
" movq %%mm4," #op3 " \n\t" \
" movq %%mm2," #op2 " \n\t"
/* This performs a 2D Forward DCT on an 8x8 block with short
coefficients. We try to do the truncation to match the C
version. */
static void fdct_short__mmx ( ogg_int16_t *InputData, ogg_int16_t *OutputData)
{
ogg_int16_t __attribute__((aligned(8))) temp[8*8];
__asm__ __volatile__ (
" .p2align 4 \n\t"
/*
* Input data is an 8x8 block. To make processing of the data more efficent
* we will transpose the block of data to two 4x8 blocks???
*/
Transpose_mmx ( (%0), 16(%0), 32(%0), 48(%0), 8(%0), 24(%0), 40(%0), 56(%0),
(%1), 16(%1), 32(%1), 48(%1), 8(%1), 24(%1), 40(%1), 56(%1))
Fdct_mmx ( (%1), 16(%1), 32(%1), 48(%1), 8(%1), 24(%1), 40(%1), 56(%1), (%2))
Transpose_mmx (64(%0), 80(%0), 96(%0),112(%0), 72(%0), 88(%0),104(%0),120(%0),
64(%1), 80(%1), 96(%1),112(%1), 72(%1), 88(%1),104(%1),120(%1))
Fdct_mmx (64(%1), 80(%1), 96(%1),112(%1), 72(%1), 88(%1),104(%1),120(%1), (%2))
Transpose_mmx ( 0(%1), 16(%1), 32(%1), 48(%1), 64(%1), 80(%1), 96(%1),112(%1),
0(%1), 16(%1), 32(%1), 48(%1), 64(%1), 80(%1), 96(%1),112(%1))
Fdct_mmx ( 0(%1), 16(%1), 32(%1), 48(%1), 64(%1), 80(%1), 96(%1),112(%1), (%2))
Transpose_mmx ( 8(%1), 24(%1), 40(%1), 56(%1), 72(%1), 88(%1),104(%1),120(%1),
8(%1), 24(%1), 40(%1), 56(%1), 72(%1), 88(%1),104(%1),120(%1))
Fdct_mmx ( 8(%1), 24(%1), 40(%1), 56(%1), 72(%1), 88(%1),104(%1),120(%1), (%2))
" emms \n\t"
: "+r" (InputData),
"+r" (OutputData)
: "r" (temp),
[xC1S7] "m" (xC1S7), /* gcc 3.1+ allows named asm parameters */
[xC2S6] "m" (xC2S6),
[xC3S5] "m" (xC3S5),
[xC4S4] "m" (xC4S4),
[xC5S3] "m" (xC5S3),
[xC6S2] "m" (xC6S2),
[xC7S1] "m" (xC7S1)
: "memory"
);
}
/* install our implementation in the function table */
void dsp_mmx_fdct_init(DspFunctions *funcs)
{
funcs->fdct_short = fdct_short__mmx;
}
#endif /* USE_ASM */

File diff suppressed because it is too large Load diff

View file

@ -1,182 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: recon_mmx.c 15153 2008-08-04 18:37:55Z tterribe $
********************************************************************/
#include "../codec_internal.h"
#if defined(USE_ASM)
static const __attribute__ ((aligned(8),used)) ogg_int64_t V128 = 0x8080808080808080LL;
static void copy8x8__mmx (unsigned char *src,
unsigned char *dest,
unsigned int stride)
{
__asm__ __volatile__ (
" .p2align 4 \n\t"
" lea (%2, %2, 2), %%edi \n\t"
" movq (%1), %%mm0 \n\t"
" movq (%1, %2), %%mm1 \n\t"
" movq (%1, %2, 2), %%mm2 \n\t"
" movq (%1, %%edi), %%mm3 \n\t"
" lea (%1, %2, 4), %1 \n\t"
" movq %%mm0, (%0) \n\t"
" movq %%mm1, (%0, %2) \n\t"
" movq %%mm2, (%0, %2, 2) \n\t"
" movq %%mm3, (%0, %%edi) \n\t"
" lea (%0, %2, 4), %0 \n\t"
" movq (%1), %%mm0 \n\t"
" movq (%1, %2), %%mm1 \n\t"
" movq (%1, %2, 2), %%mm2 \n\t"
" movq (%1, %%edi), %%mm3 \n\t"
" movq %%mm0, (%0) \n\t"
" movq %%mm1, (%0, %2) \n\t"
" movq %%mm2, (%0, %2, 2) \n\t"
" movq %%mm3, (%0, %%edi) \n\t"
: "+a" (dest)
: "c" (src),
"d" (stride)
: "memory", "edi"
);
}
static void recon_intra8x8__mmx (unsigned char *ReconPtr, ogg_int16_t *ChangePtr,
ogg_uint32_t LineStep)
{
__asm__ __volatile__ (
" .p2align 4 \n\t"
" movq %[V128], %%mm0 \n\t" /* Set mm0 to 0x8080808080808080 */
" lea 128(%1), %%edi \n\t" /* Endpoint in input buffer */
"1: \n\t"
" movq (%1), %%mm2 \n\t" /* First four input values */
" packsswb 8(%1), %%mm2 \n\t" /* pack with next(high) four values */
" por %%mm0, %%mm0 \n\t"
" pxor %%mm0, %%mm2 \n\t" /* Convert result to unsigned (same as add 128) */
" lea 16(%1), %1 \n\t" /* Step source buffer */
" cmp %%edi, %1 \n\t" /* are we done */
" movq %%mm2, (%0) \n\t" /* store results */
" lea (%0, %2), %0 \n\t" /* Step output buffer */
" jc 1b \n\t" /* Loop back if we are not done */
: "+r" (ReconPtr)
: "r" (ChangePtr),
"r" (LineStep),
[V128] "m" (V128)
: "memory", "edi"
);
}
static void recon_inter8x8__mmx (unsigned char *ReconPtr, unsigned char *RefPtr,
ogg_int16_t *ChangePtr, ogg_uint32_t LineStep)
{
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm0, %%mm0 \n\t"
" lea 128(%1), %%edi \n\t"
"1: \n\t"
" movq (%2), %%mm2 \n\t" /* (+3 misaligned) 8 reference pixels */
" movq (%1), %%mm4 \n\t" /* first 4 changes */
" movq %%mm2, %%mm3 \n\t"
" movq 8(%1), %%mm5 \n\t" /* last 4 changes */
" punpcklbw %%mm0, %%mm2 \n\t" /* turn first 4 refs into positive 16-bit #s */
" paddsw %%mm4, %%mm2 \n\t" /* add in first 4 changes */
" punpckhbw %%mm0, %%mm3 \n\t" /* turn last 4 refs into positive 16-bit #s */
" paddsw %%mm5, %%mm3 \n\t" /* add in last 4 changes */
" add %3, %2 \n\t" /* next row of reference pixels */
" packuswb %%mm3, %%mm2 \n\t" /* pack result to unsigned 8-bit values */
" lea 16(%1), %1 \n\t" /* next row of changes */
" cmp %%edi, %1 \n\t" /* are we done? */
" movq %%mm2, (%0) \n\t" /* store result */
" lea (%0, %3), %0 \n\t" /* next row of output */
" jc 1b \n\t"
: "+r" (ReconPtr)
: "r" (ChangePtr),
"r" (RefPtr),
"r" (LineStep)
: "memory", "edi"
);
}
static void recon_inter8x8_half__mmx (unsigned char *ReconPtr, unsigned char *RefPtr1,
unsigned char *RefPtr2, ogg_int16_t *ChangePtr,
ogg_uint32_t LineStep)
{
__asm__ __volatile__ (
" .p2align 4 \n\t"
" pxor %%mm0, %%mm0 \n\t"
" lea 128(%1), %%edi \n\t"
"1: \n\t"
" movq (%2), %%mm2 \n\t" /* (+3 misaligned) 8 reference pixels */
" movq (%3), %%mm4 \n\t" /* (+3 misaligned) 8 reference pixels */
" movq %%mm2, %%mm3 \n\t"
" punpcklbw %%mm0, %%mm2 \n\t" /* mm2 = start ref1 as positive 16-bit #s */
" movq %%mm4, %%mm5 \n\t"
" movq (%1), %%mm6 \n\t" /* first 4 changes */
" punpckhbw %%mm0, %%mm3 \n\t" /* mm3 = end ref1 as positive 16-bit #s */
" movq 8(%1), %%mm7 \n\t" /* last 4 changes */
" punpcklbw %%mm0, %%mm4 \n\t" /* mm4 = start ref2 as positive 16-bit #s */
" punpckhbw %%mm0, %%mm5 \n\t" /* mm5 = end ref2 as positive 16-bit #s */
" paddw %%mm4, %%mm2 \n\t" /* mm2 = start (ref1 + ref2) */
" paddw %%mm5, %%mm3 \n\t" /* mm3 = end (ref1 + ref2) */
" psrlw $1, %%mm2 \n\t" /* mm2 = start (ref1 + ref2)/2 */
" psrlw $1, %%mm3 \n\t" /* mm3 = end (ref1 + ref2)/2 */
" paddw %%mm6, %%mm2 \n\t" /* add changes to start */
" paddw %%mm7, %%mm3 \n\t" /* add changes to end */
" lea 16(%1), %1 \n\t" /* next row of changes */
" packuswb %%mm3, %%mm2 \n\t" /* pack start|end to unsigned 8-bit */
" add %4, %2 \n\t" /* next row of reference pixels */
" add %4, %3 \n\t" /* next row of reference pixels */
" movq %%mm2, (%0) \n\t" /* store result */
" add %4, %0 \n\t" /* next row of output */
" cmp %%edi, %1 \n\t" /* are we done? */
" jc 1b \n\t"
: "+r" (ReconPtr)
: "r" (ChangePtr),
"r" (RefPtr1),
"r" (RefPtr2),
"m" (LineStep)
: "memory", "edi"
);
}
void dsp_mmx_recon_init(DspFunctions *funcs)
{
funcs->copy8x8 = copy8x8__mmx;
funcs->recon_intra8x8 = recon_intra8x8__mmx;
funcs->recon_inter8x8 = recon_inter8x8__mmx;
funcs->recon_inter8x8_half = recon_inter8x8_half__mmx;
}
#endif /* USE_ASM */

File diff suppressed because it is too large Load diff

View file

@ -1,333 +0,0 @@
;//==========================================================================
;//
;// THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY
;// KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
;// IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR
;// PURPOSE.
;//
;// Copyright (c) 1999 - 2001 On2 Technologies Inc. All Rights Reserved.
;//
;//--------------------------------------------------------------------------
#include "theora/theora.h"
#include "../codec_internal.h"
#include "../dsp.h"
static const ogg_int64_t xC1S7 = 0x0fb15fb15fb15fb15;
static const ogg_int64_t xC2S6 = 0x0ec83ec83ec83ec83;
static const ogg_int64_t xC3S5 = 0x0d4dbd4dbd4dbd4db;
static const ogg_int64_t xC4S4 = 0x0b505b505b505b505;
static const ogg_int64_t xC5S3 = 0x08e3a8e3a8e3a8e3a;
static const ogg_int64_t xC6S2 = 0x061f861f861f861f8;
static const ogg_int64_t xC7S1 = 0x031f131f131f131f1;
static __inline void Transpose_mmx( ogg_int16_t *InputData1, ogg_int16_t *OutputData1,
ogg_int16_t *InputData2, ogg_int16_t *OutputData2)
{
__asm {
align 16
mov eax, InputData1
mov ebx, InputData2
mov ecx, OutputData1
mov edx, OutputData2
movq mm0, [eax] ; /* mm0 = a0 a1 a2 a3 */
movq mm4, [ebx] ; /* mm4 = e4 e5 e6 e7 */
movq mm1, [16 + eax] ; /* mm1 = b0 b1 b2 b3 */
movq mm5, [16 + ebx] ; /* mm5 = f4 f5 f6 f7 */
movq mm2, [32 + eax] ; /* mm2 = c0 c1 c2 c3 */
movq mm6, [32 + ebx] ; /* mm6 = g4 g5 g6 g7 */
movq mm3, [48 + eax] ; /* mm3 = d0 d1 d2 d3 */
movq [16 + ecx], mm1 ; /* save b0 b1 b2 b3 */
movq mm7, [48 + ebx] ; /* mm7 = h0 h1 h2 h3 */
; /* Transpose 2x8 block */
movq mm1, mm4 ; /* mm1 = e3 e2 e1 e0 */
punpcklwd mm4, mm5 ; /* mm4 = f1 e1 f0 e0 */
movq [ecx], mm0 ; /* save a3 a2 a1 a0 */
punpckhwd mm1, mm5 ; /* mm1 = f3 e3 f2 e2 */
movq mm0, mm6 ; /* mm0 = g3 g2 g1 g0 */
punpcklwd mm6, mm7 ; /* mm6 = h1 g1 h0 g0 */
movq mm5, mm4 ; /* mm5 = f1 e1 f0 e0 */
punpckldq mm4, mm6 ; /* mm4 = h0 g0 f0 e0 = MM4 */
punpckhdq mm5, mm6 ; /* mm5 = h1 g1 f1 e1 = MM5 */
movq mm6, mm1 ; /* mm6 = f3 e3 f2 e2 */
movq [edx], mm4 ;
punpckhwd mm0, mm7 ; /* mm0 = h3 g3 h2 g2 */
movq [16 + edx], mm5 ;
punpckhdq mm6, mm0 ; /* mm6 = h3 g3 f3 e3 = MM7 */
movq mm4, [ecx] ; /* mm4 = a3 a2 a1 a0 */
punpckldq mm1, mm0 ; /* mm1 = h2 g2 f2 e2 = MM6 */
movq mm5, [16 + ecx] ; /* mm5 = b3 b2 b1 b0 */
movq mm0, mm4 ; /* mm0 = a3 a2 a1 a0 */
movq [48 + edx], mm6 ;
punpcklwd mm0, mm5 ; /* mm0 = b1 a1 b0 a0 */
movq [32 + edx], mm1 ;
punpckhwd mm4, mm5 ; /* mm4 = b3 a3 b2 a2 */
movq mm5, mm2 ; /* mm5 = c3 c2 c1 c0 */
punpcklwd mm2, mm3 ; /* mm2 = d1 c1 d0 c0 */
movq mm1, mm0 ; /* mm1 = b1 a1 b0 a0 */
punpckldq mm0, mm2 ; /* mm0 = d0 c0 b0 a0 = MM0 */
punpckhdq mm1, mm2 ; /* mm1 = d1 c1 b1 a1 = MM1 */
movq mm2, mm4 ; /* mm2 = b3 a3 b2 a2 */
movq [ecx], mm0 ;
punpckhwd mm5, mm3 ; /* mm5 = d3 c3 d2 c2 */
movq [16 + ecx], mm1 ;
punpckhdq mm4, mm5 ; /* mm4 = d3 c3 b3 a3 = MM3 */
punpckldq mm2, mm5 ; /* mm2 = d2 c2 b2 a2 = MM2 */
movq [48 + ecx], mm4 ;
movq [32 + ecx], mm2 ;
};
}
static __inline void Fdct_mmx( ogg_int16_t *InputData1, ogg_int16_t *InputData2, ogg_int16_t *temp)
{
__asm {
align 16
mov eax, InputData1
mov ebx, InputData2
mov ecx, temp
movq mm0, [eax] ;
movq mm1, [16 + eax] ;
movq mm2, [48 + eax] ;
movq mm3, [16 + ebx] ;
movq mm4, mm0 ;
movq mm5, mm1 ;
movq mm6, mm2 ;
movq mm7, mm3 ;
;
paddsw mm0, [48 + ebx] ; /* mm0 = ip0 + ip7 = is07 */
paddsw mm1, [32 + eax] ; /* mm1 = ip1 + ip2 = is12 */
paddsw mm2, [ebx] ; /* mm2 = ip3 + ip4 = is34 */
paddsw mm3, [32 + ebx] ; /* mm3 = ip5 + ip6 = is56 */
psubsw mm4, [48 + ebx] ; /* mm4 = ip0 - ip7 = id07 */
psubsw mm5, [32 + eax] ; /* mm5 = ip1 - ip2 = id12 */
;
psubsw mm0, mm2 ; /* mm0 = is07 - is34 */
;
paddsw mm2, mm2 ;
;
psubsw mm6, [ebx] ; /* mm6 = ip3 - ip4 = id34 */
;
paddsw mm2, mm0 ; /* mm2 = is07 + is34 = is0734 */
psubsw mm1, mm3 ; /* mm1 = is12 - is56 */
movq [ecx], mm0 ; /* Save is07 - is34 to free mm0; */
paddsw mm3, mm3 ;
paddsw mm3, mm1 ; /* mm3 = is12 + 1s56 = is1256 */
;
psubsw mm7, [32 + ebx] ; /* mm7 = ip5 - ip6 = id56 */
; /* ------------------------------------------------------------------- */
psubsw mm5, mm7 ; /* mm5 = id12 - id56 */
paddsw mm7, mm7 ;
paddsw mm7, mm5 ; /* mm7 = id12 + id56 */
; /* ------------------------------------------------------------------- */
psubsw mm2, mm3 ; /* mm2 = is0734 - is1256 */
paddsw mm3, mm3 ;
;
movq mm0, mm2 ; /* make a copy */
paddsw mm3, mm2 ; /* mm3 = is0734 + is1256 */
;
pmulhw mm0, xC4S4 ; /* mm0 = xC4S4 * ( is0734 - is1256 ) - ( is0734 - is1256 ) */
paddw mm0, mm2 ; /* mm0 = xC4S4 * ( is0734 - is1256 ) */
psrlw mm2, 15 ;
paddw mm0, mm2 ; /* Truncate mm0, now it is op[4] */
;
movq mm2, mm3 ;
movq [ebx], mm0 ; /* save ip4, now mm0,mm2 are free */
;
movq mm0, mm3 ;
pmulhw mm3, xC4S4 ; /* mm3 = xC4S4 * ( is0734 +is1256 ) - ( is0734 +is1256 ) */
;
psrlw mm2, 15 ;
paddw mm3, mm0 ; /* mm3 = xC4S4 * ( is0734 +is1256 ) */
paddw mm3, mm2 ; /* Truncate mm3, now it is op[0] */
;
movq [eax], mm3 ;
; /* ------------------------------------------------------------------- */
movq mm3, [ecx] ; /* mm3 = irot_input_y */
pmulhw mm3, xC2S6 ; /* mm3 = xC2S6 * irot_input_y - irot_input_y */
;
movq mm2, [ecx] ;
movq mm0, mm2 ;
;
psrlw mm2, 15 ; /* mm3 = xC2S6 * irot_input_y */
paddw mm3, mm0 ;
;
paddw mm3, mm2 ; /* Truncated */
movq mm0, mm5 ;
;
movq mm2, mm5 ;
pmulhw mm0, xC6S2 ; /* mm0 = xC6S2 * irot_input_x */
;
psrlw mm2, 15 ;
paddw mm0, mm2 ; /* Truncated */
;
paddsw mm3, mm0 ; /* ip[2] */
movq [32 + eax], mm3 ; /* Save ip2 */
;
movq mm0, mm5 ;
movq mm2, mm5 ;
;
pmulhw mm5, xC2S6 ; /* mm5 = xC2S6 * irot_input_x - irot_input_x */
psrlw mm2, 15 ;
;
movq mm3, [ecx] ;
paddw mm5, mm0 ; /* mm5 = xC2S6 * irot_input_x */
;
paddw mm5, mm2 ; /* Truncated */
movq mm2, mm3 ;
;
pmulhw mm3, xC6S2 ; /* mm3 = xC6S2 * irot_input_y */
psrlw mm2, 15 ;
;
paddw mm3, mm2 ; /* Truncated */
psubsw mm3, mm5 ;
;
movq [32 + ebx], mm3 ;
; /* ------------------------------------------------------------------- */
movq mm0, xC4S4 ;
movq mm2, mm1 ;
movq mm3, mm1 ;
;
pmulhw mm1, mm0 ; /* mm0 = xC4S4 * ( is12 - is56 ) - ( is12 - is56 ) */
psrlw mm2, 15 ;
;
paddw mm1, mm3 ; /* mm0 = xC4S4 * ( is12 - is56 ) */
paddw mm1, mm2 ; /* Truncate mm1, now it is icommon_product1 */
;
movq mm2, mm7 ;
movq mm3, mm7 ;
;
pmulhw mm7, mm0 ; /* mm7 = xC4S4 * ( id12 + id56 ) - ( id12 + id56 ) */
psrlw mm2, 15 ;
;
paddw mm7, mm3 ; /* mm7 = xC4S4 * ( id12 + id56 ) */
paddw mm7, mm2 ; /* Truncate mm7, now it is icommon_product2 */
; /* ------------------------------------------------------------------- */
pxor mm0, mm0 ; /* Clear mm0 */
psubsw mm0, mm6 ; /* mm0 = - id34 */
;
psubsw mm0, mm7 ; /* mm0 = - ( id34 + idcommon_product2 ) */
paddsw mm6, mm6 ;
paddsw mm6, mm0 ; /* mm6 = id34 - icommon_product2 */
;
psubsw mm4, mm1 ; /* mm4 = id07 - icommon_product1 */
paddsw mm1, mm1 ;
paddsw mm1, mm4 ; /* mm1 = id07 + icommon_product1 */
; /* ------------------------------------------------------------------- */
movq mm7, xC1S7 ;
movq mm2, mm1 ;
;
movq mm3, mm1 ;
pmulhw mm1, mm7 ; /* mm1 = xC1S7 * irot_input_x - irot_input_x */
;
movq mm7, xC7S1 ;
psrlw mm2, 15 ;
;
paddw mm1, mm3 ; /* mm1 = xC1S7 * irot_input_x */
paddw mm1, mm2 ; /* Trucated */
;
pmulhw mm3, mm7 ; /* mm3 = xC7S1 * irot_input_x */
paddw mm3, mm2 ; /* Truncated */
;
movq mm5, mm0 ;
movq mm2, mm0 ;
;
movq mm7, xC1S7 ;
pmulhw mm0, mm7 ; /* mm0 = xC1S7 * irot_input_y - irot_input_y */
;
movq mm7, xC7S1 ;
psrlw mm2, 15 ;
;
paddw mm0, mm5 ; /* mm0 = xC1S7 * irot_input_y */
paddw mm0, mm2 ; /* Truncated */
;
pmulhw mm5, mm7 ; /* mm5 = xC7S1 * irot_input_y */
paddw mm5, mm2 ; /* Truncated */
;
psubsw mm1, mm5 ; /* mm1 = xC1S7 * irot_input_x - xC7S1 * irot_input_y = ip1 */
paddsw mm3, mm0 ; /* mm3 = xC7S1 * irot_input_x - xC1S7 * irot_input_y = ip7 */
;
movq [16 + eax], mm1 ;
movq [48 + ebx], mm3 ;
; /* ------------------------------------------------------------------- */
movq mm0, xC3S5 ;
movq mm1, xC5S3 ;
;
movq mm5, mm6 ;
movq mm7, mm6 ;
;
movq mm2, mm4 ;
movq mm3, mm4 ;
;
pmulhw mm4, mm0 ; /* mm4 = xC3S5 * irot_input_x - irot_input_x */
pmulhw mm6, mm1 ; /* mm6 = xC5S3 * irot_input_y - irot_input_y */
;
psrlw mm2, 15 ;
psrlw mm5, 15 ;
;
paddw mm4, mm3 ; /* mm4 = xC3S5 * irot_input_x */
paddw mm6, mm7 ; /* mm6 = xC5S3 * irot_input_y */
;
paddw mm4, mm2 ; /* Truncated */
paddw mm6, mm5 ; /* Truncated */
;
psubsw mm4, mm6 ; /* ip3 */
movq [48 + eax], mm4 ;
;
movq mm4, mm3 ;
movq mm6, mm7 ;
;
pmulhw mm3, mm1 ; /* mm3 = xC5S3 * irot_input_x - irot_input_x */
pmulhw mm7, mm0 ; /* mm7 = xC3S5 * irot_input_y - irot_input_y */
;
paddw mm4, mm2 ;
paddw mm6, mm5 ;
;
paddw mm3, mm4 ; /* mm3 = xC5S3 * irot_input_x */
paddw mm7, mm6 ; /* mm7 = xC3S5 * irot_input_y */
;
paddw mm3, mm7 ; /* ip5 */
movq [16 + ebx], mm3 ;
};
}
static void fdct_short__mmx ( ogg_int16_t *InputData, ogg_int16_t *OutputData)
{
static ogg_int16_t tmp[32];
ogg_int16_t* align_tmp = (ogg_int16_t*)((unsigned char*)tmp + (16 - ((int)tmp)&15));
Transpose_mmx(InputData, OutputData, InputData + 4, OutputData + 4);
Fdct_mmx(OutputData, OutputData + 4, align_tmp);
Transpose_mmx(InputData + 32, OutputData + 32, InputData + 36, OutputData + 36);
Fdct_mmx(OutputData+32, OutputData + 36, align_tmp);
Transpose_mmx(OutputData, OutputData, OutputData + 32, OutputData + 32);
Fdct_mmx(OutputData, OutputData + 32, align_tmp);
Transpose_mmx(OutputData + 4, OutputData + 4, OutputData + 36, OutputData + 36);
Fdct_mmx(OutputData + 4, OutputData + 36, align_tmp);
__asm emms
}
void dsp_mmx_fdct_init(DspFunctions *funcs)
{
funcs->fdct_short = fdct_short__mmx;
}

View file

@ -1,197 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: reconstruct.c,v 1.6 2003/12/03 08:59:41 arc Exp $
********************************************************************/
#include "../codec_internal.h"
static const unsigned __int64 V128 = 0x8080808080808080;
static void copy8x8__mmx (unsigned char *src,
unsigned char *dest,
unsigned int stride)
{
//Is this even the fastest way to do this?
__asm {
align 16
mov eax, src
mov ebx, dest
mov ecx, stride
lea edi, [ecx + ecx * 2]
movq mm0, [eax]
movq mm1, [eax + ecx]
movq mm2, [eax + ecx * 2]
movq mm3, [eax + edi]
lea eax, [eax + ecx * 4]
movq [ebx], mm0
movq [ebx + ecx], mm1
movq [ebx + ecx * 2], mm2
movq [ebx + edi], mm3
lea ebx, [ebx + ecx * 4]
movq mm0, [eax]
movq mm1, [eax + ecx]
movq mm2, [eax + ecx * 2]
movq mm3, [eax + edi]
movq [ebx], mm0
movq [ebx + ecx], mm1
movq [ebx + ecx * 2], mm2
movq [ebx + edi], mm3
};
}
static void recon_intra8x8__mmx (unsigned char *ReconPtr, ogg_int16_t *ChangePtr,
ogg_uint32_t LineStep)
{
__asm {
align 16
mov eax, ReconPtr
mov ebx, ChangePtr
mov ecx, LineStep
movq mm0, V128
lea edi, [128 + ebx]
loop_start:
movq mm2, [ebx]
packsswb mm2, [8 + ebx]
por mm0, mm0
pxor mm2, mm0
lea ebx, [16 + ebx]
cmp ebx, edi
movq [eax], mm2
lea eax, [eax + ecx]
jc loop_start
};
}
static void recon_inter8x8__mmx (unsigned char *ReconPtr, unsigned char *RefPtr,
ogg_int16_t *ChangePtr, ogg_uint32_t LineStep)
{
__asm {
align 16
mov eax, ReconPtr
mov ebx, ChangePtr
mov ecx, LineStep
mov edx, RefPtr
pxor mm0, mm0
lea edi, [128 + ebx]
loop_start:
movq mm2, [edx]
movq mm4, [ebx]
movq mm3, mm2
movq mm5, [8 + ebx]
punpcklbw mm2, mm0
paddsw mm2, mm4
punpckhbw mm3, mm0
paddsw mm3, mm5
add edx, ecx
packuswb mm2, mm3
lea ebx, [16 + ebx]
cmp ebx, edi
movq [eax], mm2
lea eax, [eax + ecx]
jc loop_start
};
}
static void recon_inter8x8_half__mmx (unsigned char *ReconPtr, unsigned char *RefPtr1,
unsigned char *RefPtr2, ogg_int16_t *ChangePtr,
ogg_uint32_t LineStep)
{
__asm {
align 16
mov eax, ReconPtr
mov ebx, ChangePtr
mov ecx, RefPtr1
mov edx, RefPtr2
pxor mm0, mm0
lea edi, [128 + ebx]
loop_start:
movq mm2, [ecx]
movq mm4, [edx]
movq mm3, mm2
punpcklbw mm2, mm0
movq mm5, mm4
movq mm6, [ebx]
punpckhbw mm3, mm0
movq mm7, [8 + ebx]
punpcklbw mm4, mm0
punpckhbw mm5, mm0
paddw mm2, mm4
paddw mm3, mm5
psrlw mm2, 1
psrlw mm3, 1
paddw mm2, mm6
paddw mm3, mm7
lea ebx, [16 + ebx]
packuswb mm2, mm3
add ecx, LineStep
add edx, LineStep
movq [eax], mm2
add eax, LineStep
cmp ebx, edi
jc loop_start
};
}
void dsp_mmx_recon_init(DspFunctions *funcs)
{
funcs->copy8x8 = copy8x8__mmx;
funcs->recon_intra8x8 = recon_intra8x8__mmx;
funcs->recon_inter8x8 = recon_inter8x8__mmx;
funcs->recon_inter8x8_half = recon_inter8x8_half__mmx;
}

View file

@ -1,409 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2008 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dct_decode_mmx.c 15400 2008-10-15 12:10:58Z tterribe $
********************************************************************/
#include <stdlib.h>
#include "../codec_internal.h"
#if defined(USE_ASM)
static const __attribute__((aligned(8),used)) ogg_int64_t OC_V3=
0x0003000300030003LL;
static const __attribute__((aligned(8),used)) ogg_int64_t OC_V4=
0x0004000400040004LL;
static void loop_filter_v(unsigned char *_pix,int _ystride,
const ogg_int16_t *_ll){
long esi;
_pix-=_ystride*2;
__asm__ __volatile__(
/*mm0=0*/
"pxor %%mm0,%%mm0\n\t"
/*esi=_ystride*3*/
"lea (%[ystride],%[ystride],2),%[s]\n\t"
/*mm7=_pix[0...8]*/
"movq (%[pix]),%%mm7\n\t"
/*mm4=_pix[0...8+_ystride*3]*/
"movq (%[pix],%[s]),%%mm4\n\t"
/*mm6=_pix[0...8]*/
"movq %%mm7,%%mm6\n\t"
/*Expand unsigned _pix[0...3] to 16 bits.*/
"punpcklbw %%mm0,%%mm6\n\t"
"movq %%mm4,%%mm5\n\t"
/*Expand unsigned _pix[4...8] to 16 bits.*/
"punpckhbw %%mm0,%%mm7\n\t"
/*Expand other arrays too.*/
"punpcklbw %%mm0,%%mm4\n\t"
"punpckhbw %%mm0,%%mm5\n\t"
/*mm7:mm6=_p[0...8]-_p[0...8+_ystride*3]:*/
"psubw %%mm4,%%mm6\n\t"
"psubw %%mm5,%%mm7\n\t"
/*mm5=mm4=_pix[0...8+_ystride]*/
"movq (%[pix],%[ystride]),%%mm4\n\t"
/*mm1=mm3=mm2=_pix[0..8]+_ystride*2]*/
"movq (%[pix],%[ystride],2),%%mm2\n\t"
"movq %%mm4,%%mm5\n\t"
"movq %%mm2,%%mm3\n\t"
"movq %%mm2,%%mm1\n\t"
/*Expand these arrays.*/
"punpckhbw %%mm0,%%mm5\n\t"
"punpcklbw %%mm0,%%mm4\n\t"
"punpckhbw %%mm0,%%mm3\n\t"
"punpcklbw %%mm0,%%mm2\n\t"
/*Preload...*/
"movq %[OC_V3],%%mm0\n\t"
/*mm3:mm2=_pix[0...8+_ystride*2]-_pix[0...8+_ystride]*/
"psubw %%mm5,%%mm3\n\t"
"psubw %%mm4,%%mm2\n\t"
/*Scale by 3.*/
"pmullw %%mm0,%%mm3\n\t"
"pmullw %%mm0,%%mm2\n\t"
/*Preload...*/
"movq %[OC_V4],%%mm0\n\t"
/*f=mm3:mm2==_pix[0...8]-_pix[0...8+_ystride*3]+
3*(_pix[0...8+_ystride*2]-_pix[0...8+_ystride])*/
"paddw %%mm7,%%mm3\n\t"
"paddw %%mm6,%%mm2\n\t"
/*Add 4.*/
"paddw %%mm0,%%mm3\n\t"
"paddw %%mm0,%%mm2\n\t"
/*"Divide" by 8.*/
"psraw $3,%%mm3\n\t"
"psraw $3,%%mm2\n\t"
/*Now compute lflim of mm3:mm2 cf. Section 7.10 of the sepc.*/
/*Free up mm5.*/
"packuswb %%mm5,%%mm4\n\t"
/*mm0=L L L L*/
"movq (%[ll]),%%mm0\n\t"
/*if(R_i<-2L||R_i>2L)R_i=0:*/
"movq %%mm2,%%mm5\n\t"
"pxor %%mm6,%%mm6\n\t"
"movq %%mm0,%%mm7\n\t"
"psubw %%mm0,%%mm6\n\t"
"psllw $1,%%mm7\n\t"
"psllw $1,%%mm6\n\t"
/*mm2==R_3 R_2 R_1 R_0*/
/*mm5==R_3 R_2 R_1 R_0*/
/*mm6==-2L -2L -2L -2L*/
/*mm7==2L 2L 2L 2L*/
"pcmpgtw %%mm2,%%mm7\n\t"
"pcmpgtw %%mm6,%%mm5\n\t"
"pand %%mm7,%%mm2\n\t"
"movq %%mm0,%%mm7\n\t"
"pand %%mm5,%%mm2\n\t"
"psllw $1,%%mm7\n\t"
"movq %%mm3,%%mm5\n\t"
/*mm3==R_7 R_6 R_5 R_4*/
/*mm5==R_7 R_6 R_5 R_4*/
/*mm6==-2L -2L -2L -2L*/
/*mm7==2L 2L 2L 2L*/
"pcmpgtw %%mm3,%%mm7\n\t"
"pcmpgtw %%mm6,%%mm5\n\t"
"pand %%mm7,%%mm3\n\t"
"movq %%mm0,%%mm7\n\t"
"pand %%mm5,%%mm3\n\t"
/*if(R_i<-L)R_i'=R_i+2L;
if(R_i>L)R_i'=R_i-2L;
if(R_i<-L||R_i>L)R_i=-R_i':*/
"psraw $1,%%mm6\n\t"
"movq %%mm2,%%mm5\n\t"
"psllw $1,%%mm7\n\t"
/*mm2==R_3 R_2 R_1 R_0*/
/*mm5==R_3 R_2 R_1 R_0*/
/*mm6==-L -L -L -L*/
/*mm0==L L L L*/
/*mm5=R_i>L?FF:00*/
"pcmpgtw %%mm0,%%mm5\n\t"
/*mm6=-L>R_i?FF:00*/
"pcmpgtw %%mm2,%%mm6\n\t"
/*mm7=R_i>L?2L:0*/
"pand %%mm5,%%mm7\n\t"
/*mm2=R_i>L?R_i-2L:R_i*/
"psubw %%mm7,%%mm2\n\t"
"movq %%mm0,%%mm7\n\t"
/*mm5=-L>R_i||R_i>L*/
"por %%mm6,%%mm5\n\t"
"psllw $1,%%mm7\n\t"
/*mm7=-L>R_i?2L:0*/
"pand %%mm6,%%mm7\n\t"
"pxor %%mm6,%%mm6\n\t"
/*mm2=-L>R_i?R_i+2L:R_i*/
"paddw %%mm7,%%mm2\n\t"
"psubw %%mm0,%%mm6\n\t"
/*mm5=-L>R_i||R_i>L?-R_i':0*/
"pand %%mm2,%%mm5\n\t"
"movq %%mm0,%%mm7\n\t"
/*mm2=-L>R_i||R_i>L?0:R_i*/
"psubw %%mm5,%%mm2\n\t"
"psllw $1,%%mm7\n\t"
/*mm2=-L>R_i||R_i>L?-R_i':R_i*/
"psubw %%mm5,%%mm2\n\t"
"movq %%mm3,%%mm5\n\t"
/*mm3==R_7 R_6 R_5 R_4*/
/*mm5==R_7 R_6 R_5 R_4*/
/*mm6==-L -L -L -L*/
/*mm0==L L L L*/
/*mm6=-L>R_i?FF:00*/
"pcmpgtw %%mm3,%%mm6\n\t"
/*mm5=R_i>L?FF:00*/
"pcmpgtw %%mm0,%%mm5\n\t"
/*mm7=R_i>L?2L:0*/
"pand %%mm5,%%mm7\n\t"
/*mm2=R_i>L?R_i-2L:R_i*/
"psubw %%mm7,%%mm3\n\t"
"psllw $1,%%mm0\n\t"
/*mm5=-L>R_i||R_i>L*/
"por %%mm6,%%mm5\n\t"
/*mm0=-L>R_i?2L:0*/
"pand %%mm6,%%mm0\n\t"
/*mm3=-L>R_i?R_i+2L:R_i*/
"paddw %%mm0,%%mm3\n\t"
/*mm5=-L>R_i||R_i>L?-R_i':0*/
"pand %%mm3,%%mm5\n\t"
/*mm2=-L>R_i||R_i>L?0:R_i*/
"psubw %%mm5,%%mm3\n\t"
/*mm2=-L>R_i||R_i>L?-R_i':R_i*/
"psubw %%mm5,%%mm3\n\t"
/*Unfortunately, there's no unsigned byte+signed byte with unsigned
saturation op code, so we have to promote things back 16 bits.*/
"pxor %%mm0,%%mm0\n\t"
"movq %%mm4,%%mm5\n\t"
"punpcklbw %%mm0,%%mm4\n\t"
"punpckhbw %%mm0,%%mm5\n\t"
"movq %%mm1,%%mm6\n\t"
"punpcklbw %%mm0,%%mm1\n\t"
"punpckhbw %%mm0,%%mm6\n\t"
/*_pix[0...8+_ystride]+=R_i*/
"paddw %%mm2,%%mm4\n\t"
"paddw %%mm3,%%mm5\n\t"
/*_pix[0...8+_ystride*2]-=R_i*/
"psubw %%mm2,%%mm1\n\t"
"psubw %%mm3,%%mm6\n\t"
"packuswb %%mm5,%%mm4\n\t"
"packuswb %%mm6,%%mm1\n\t"
/*Write it back out.*/
"movq %%mm4,(%[pix],%[ystride])\n\t"
"movq %%mm1,(%[pix],%[ystride],2)\n\t"
:[s]"=&S"(esi)
:[pix]"r"(_pix),[ystride]"r"((long)_ystride),[ll]"r"(_ll),
[OC_V3]"m"(OC_V3),[OC_V4]"m"(OC_V4)
:"memory"
);
}
/*This code implements the bulk of loop_filter_h().
Data are striped p0 p1 p2 p3 ... p0 p1 p2 p3 ..., so in order to load all
four p0's to one register we must transpose the values in four mmx regs.
When half is done we repeat this for the rest.*/
static void loop_filter_h4(unsigned char *_pix,long _ystride,
const ogg_int16_t *_ll){
long esi;
long edi;
__asm__ __volatile__(
/*x x x x 3 2 1 0*/
"movd (%[pix]),%%mm0\n\t"
/*esi=_ystride*3*/
"lea (%[ystride],%[ystride],2),%[s]\n\t"
/*x x x x 7 6 5 4*/
"movd (%[pix],%[ystride]),%%mm1\n\t"
/*x x x x B A 9 8*/
"movd (%[pix],%[ystride],2),%%mm2\n\t"
/*x x x x F E D C*/
"movd (%[pix],%[s]),%%mm3\n\t"
/*mm0=7 3 6 2 5 1 4 0*/
"punpcklbw %%mm1,%%mm0\n\t"
/*mm2=F B E A D 9 C 8*/
"punpcklbw %%mm3,%%mm2\n\t"
/*mm1=7 3 6 2 5 1 4 0*/
"movq %%mm0,%%mm1\n\t"
/*mm0=F B 7 3 E A 6 2*/
"punpckhwd %%mm2,%%mm0\n\t"
/*mm1=D 9 5 1 C 8 4 0*/
"punpcklwd %%mm2,%%mm1\n\t"
"pxor %%mm7,%%mm7\n\t"
/*mm5=D 9 5 1 C 8 4 0*/
"movq %%mm1,%%mm5\n\t"
/*mm1=x C x 8 x 4 x 0==pix[0]*/
"punpcklbw %%mm7,%%mm1\n\t"
/*mm5=x D x 9 x 5 x 1==pix[1]*/
"punpckhbw %%mm7,%%mm5\n\t"
/*mm3=F B 7 3 E A 6 2*/
"movq %%mm0,%%mm3\n\t"
/*mm0=x E x A x 6 x 2==pix[2]*/
"punpcklbw %%mm7,%%mm0\n\t"
/*mm3=x F x B x 7 x 3==pix[3]*/
"punpckhbw %%mm7,%%mm3\n\t"
/*mm1=mm1-mm3==pix[0]-pix[3]*/
"psubw %%mm3,%%mm1\n\t"
/*Save a copy of pix[2] for later.*/
"movq %%mm0,%%mm4\n\t"
/*mm0=mm0-mm5==pix[2]-pix[1]*/
"psubw %%mm5,%%mm0\n\t"
/*Scale by 3.*/
"pmullw %[OC_V3],%%mm0\n\t"
/*f=mm1==_pix[0]-_pix[3]+ 3*(_pix[2]-_pix[1])*/
"paddw %%mm1,%%mm0\n\t"
/*Add 4.*/
"paddw %[OC_V4],%%mm0\n\t"
/*"Divide" by 8, producing the residuals R_i.*/
"psraw $3,%%mm0\n\t"
/*Now compute lflim of mm0 cf. Section 7.10 of the sepc.*/
/*mm6=L L L L*/
"movq (%[ll]),%%mm6\n\t"
/*if(R_i<-2L||R_i>2L)R_i=0:*/
"movq %%mm0,%%mm1\n\t"
"pxor %%mm2,%%mm2\n\t"
"movq %%mm6,%%mm3\n\t"
"psubw %%mm6,%%mm2\n\t"
"psllw $1,%%mm3\n\t"
"psllw $1,%%mm2\n\t"
/*mm0==R_3 R_2 R_1 R_0*/
/*mm1==R_3 R_2 R_1 R_0*/
/*mm2==-2L -2L -2L -2L*/
/*mm3==2L 2L 2L 2L*/
"pcmpgtw %%mm0,%%mm3\n\t"
"pcmpgtw %%mm2,%%mm1\n\t"
"pand %%mm3,%%mm0\n\t"
"pand %%mm1,%%mm0\n\t"
/*if(R_i<-L)R_i'=R_i+2L;
if(R_i>L)R_i'=R_i-2L;
if(R_i<-L||R_i>L)R_i=-R_i':*/
"psraw $1,%%mm2\n\t"
"movq %%mm0,%%mm1\n\t"
"movq %%mm6,%%mm3\n\t"
/*mm0==R_3 R_2 R_1 R_0*/
/*mm1==R_3 R_2 R_1 R_0*/
/*mm2==-L -L -L -L*/
/*mm6==L L L L*/
/*mm2=-L>R_i?FF:00*/
"pcmpgtw %%mm0,%%mm2\n\t"
/*mm1=R_i>L?FF:00*/
"pcmpgtw %%mm6,%%mm1\n\t"
/*mm3=2L 2L 2L 2L*/
"psllw $1,%%mm3\n\t"
/*mm6=2L 2L 2L 2L*/
"psllw $1,%%mm6\n\t"
/*mm3=R_i>L?2L:0*/
"pand %%mm1,%%mm3\n\t"
/*mm6=-L>R_i?2L:0*/
"pand %%mm2,%%mm6\n\t"
/*mm0=R_i>L?R_i-2L:R_i*/
"psubw %%mm3,%%mm0\n\t"
/*mm1=-L>R_i||R_i>L*/
"por %%mm2,%%mm1\n\t"
/*mm0=-L>R_i?R_i+2L:R_i*/
"paddw %%mm6,%%mm0\n\t"
/*mm1=-L>R_i||R_i>L?R_i':0*/
"pand %%mm0,%%mm1\n\t"
/*mm0=-L>R_i||R_i>L?0:R_i*/
"psubw %%mm1,%%mm0\n\t"
/*mm0=-L>R_i||R_i>L?-R_i':R_i*/
"psubw %%mm1,%%mm0\n\t"
/*_pix[1]+=R_i;*/
"paddw %%mm0,%%mm5\n\t"
/*_pix[2]-=R_i;*/
"psubw %%mm0,%%mm4\n\t"
/*mm5=x x x x D 9 5 1*/
"packuswb %%mm7,%%mm5\n\t"
/*mm4=x x x x E A 6 2*/
"packuswb %%mm7,%%mm4\n\t"
/*mm5=E D A 9 6 5 2 1*/
"punpcklbw %%mm4,%%mm5\n\t"
/*edi=6 5 2 1*/
"movd %%mm5,%%edi\n\t"
"movw %%di,1(%[pix])\n\t"
/*Why is there such a big stall here?*/
"psrlq $32,%%mm5\n\t"
"shrl $16,%%edi\n\t"
"movw %%di,1(%[pix],%[ystride])\n\t"
/*edi=E D A 9*/
"movd %%mm5,%%edi\n\t"
"movw %%di,1(%[pix],%[ystride],2)\n\t"
"shrl $16,%%edi\n\t"
"movw %%di,1(%[pix],%[s])\n\t"
:[s]"=&S"(esi),[d]"=&D"(edi),
[pix]"+r"(_pix),[ystride]"+r"(_ystride),[ll]"+r"(_ll)
:[OC_V3]"m"(OC_V3),[OC_V4]"m"(OC_V4)
:"memory"
);
}
static void loop_filter_h(unsigned char *_pix,int _ystride,
const ogg_int16_t *_ll){
_pix-=2;
loop_filter_h4(_pix,_ystride,_ll);
loop_filter_h4(_pix+(_ystride<<2),_ystride,_ll);
}
static void loop_filter_mmx(PB_INSTANCE *pbi, int FLimit){
int j;
ogg_int16_t __attribute__((aligned(8))) ll[4];
unsigned char *cp = pbi->display_fragments;
ogg_uint32_t *bp = pbi->recon_pixel_index_table;
if ( FLimit == 0 ) return;
ll[0]=ll[1]=ll[2]=ll[3]=FLimit;
for ( j = 0; j < 3 ; j++){
ogg_uint32_t *bp_begin = bp;
ogg_uint32_t *bp_end;
int stride;
int h;
switch(j) {
case 0: /* y */
bp_end = bp + pbi->YPlaneFragments;
h = pbi->HFragments;
stride = pbi->YStride;
break;
default: /* u,v, 4:20 specific */
bp_end = bp + pbi->UVPlaneFragments;
h = pbi->HFragments >> 1;
stride = pbi->UVStride;
break;
}
while(bp<bp_end){
ogg_uint32_t *bp_left = bp;
ogg_uint32_t *bp_right = bp + h;
while(bp<bp_right){
if(cp[0]){
if(bp>bp_left)
loop_filter_h(&pbi->LastFrameRecon[bp[0]],stride,ll);
if(bp_left>bp_begin)
loop_filter_v(&pbi->LastFrameRecon[bp[0]],stride,ll);
if(bp+1<bp_right && !cp[1])
loop_filter_h(&pbi->LastFrameRecon[bp[0]]+8,stride,ll);
if(bp+h<bp_end && !cp[h])
loop_filter_v(&pbi->LastFrameRecon[bp[h]],stride,ll);
}
bp++;
cp++;
}
}
}
__asm__ __volatile__("emms\n\t");
}
/* install our implementation in the function table */
void dsp_mmx_dct_decode_init(DspFunctions *funcs)
{
funcs->LoopFilter = loop_filter_mmx;
}
#endif /* USE_ASM */

View file

@ -1,303 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dsp_mmx.c 15397 2008-10-14 02:06:24Z tterribe $
********************************************************************/
#include <stdlib.h>
#include "../codec_internal.h"
#include "../dsp.h"
#if defined(USE_ASM)
typedef unsigned long long ogg_uint64_t;
static const __attribute__ ((aligned(8),used)) ogg_int64_t V128 = 0x0080008000800080LL;
#define DSP_OP_AVG(a,b) ((((int)(a)) + ((int)(b)))/2)
#define DSP_OP_DIFF(a,b) (((int)(a)) - ((int)(b)))
#define DSP_OP_ABS_DIFF(a,b) abs((((int)(a)) - ((int)(b))))
static void sub8x8__mmx (unsigned char *FiltPtr, unsigned char *ReconPtr,
ogg_int16_t *DctInputPtr, ogg_uint32_t PixelsPerLine,
ogg_uint32_t ReconPixelsPerLine)
{
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm7, %%mm7 \n\t"
".rept 8 \n\t"
" movq (%0), %%mm0 \n\t" /* mm0 = FiltPtr */
" movq (%1), %%mm1 \n\t" /* mm1 = ReconPtr */
" movq %%mm0, %%mm2 \n\t" /* dup to prepare for up conversion */
" movq %%mm1, %%mm3 \n\t" /* dup to prepare for up conversion */
/* convert from UINT8 to INT16 */
" punpcklbw %%mm7, %%mm0 \n\t" /* mm0 = INT16(FiltPtr) */
" punpcklbw %%mm7, %%mm1 \n\t" /* mm1 = INT16(ReconPtr) */
" punpckhbw %%mm7, %%mm2 \n\t" /* mm2 = INT16(FiltPtr) */
" punpckhbw %%mm7, %%mm3 \n\t" /* mm3 = INT16(ReconPtr) */
/* start calculation */
" psubw %%mm1, %%mm0 \n\t" /* mm0 = FiltPtr - ReconPtr */
" psubw %%mm3, %%mm2 \n\t" /* mm2 = FiltPtr - ReconPtr */
" movq %%mm0, (%2) \n\t" /* write answer out */
" movq %%mm2, 8(%2) \n\t" /* write answer out */
/* Increment pointers */
" add $16, %2 \n\t"
" add %3, %0 \n\t"
" add %4, %1 \n\t"
".endr \n\t"
: "+r" (FiltPtr),
"+r" (ReconPtr),
"+r" (DctInputPtr)
: "r" ((ogg_uint64_t)PixelsPerLine),
"r" ((ogg_uint64_t)ReconPixelsPerLine)
: "memory"
);
}
static void sub8x8_128__mmx (unsigned char *FiltPtr, ogg_int16_t *DctInputPtr,
ogg_uint32_t PixelsPerLine)
{
ogg_uint64_t ppl = PixelsPerLine;
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm7, %%mm7 \n\t"
" movq %[V128], %%mm1 \n\t"
".rept 8 \n\t"
" movq (%0), %%mm0 \n\t" /* mm0 = FiltPtr */
" movq %%mm0, %%mm2 \n\t" /* dup to prepare for up conversion */
/* convert from UINT8 to INT16 */
" punpcklbw %%mm7, %%mm0 \n\t" /* mm0 = INT16(FiltPtr) */
" punpckhbw %%mm7, %%mm2 \n\t" /* mm2 = INT16(FiltPtr) */
/* start calculation */
" psubw %%mm1, %%mm0 \n\t" /* mm0 = FiltPtr - 128 */
" psubw %%mm1, %%mm2 \n\t" /* mm2 = FiltPtr - 128 */
" movq %%mm0, (%1) \n\t" /* write answer out */
" movq %%mm2, 8(%1) \n\t" /* write answer out */
/* Increment pointers */
" add $16, %1 \n\t"
" add %2, %0 \n\t"
".endr \n\t"
: "+r" (FiltPtr),
"+r" (DctInputPtr)
: "r" (ppl), /* gcc bug? a cast won't work here, e.g. (ogg_uint64_t)PixelsPerLine */
[V128] "m" (V128)
: "memory"
);
}
static void sub8x8avg2__mmx (unsigned char *FiltPtr, unsigned char *ReconPtr1,
unsigned char *ReconPtr2, ogg_int16_t *DctInputPtr,
ogg_uint32_t PixelsPerLine,
ogg_uint32_t ReconPixelsPerLine)
{
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm7, %%mm7 \n\t"
".rept 8 \n\t"
" movq (%0), %%mm0 \n\t" /* mm0 = FiltPtr */
" movq (%1), %%mm1 \n\t" /* mm1 = ReconPtr1 */
" movq (%2), %%mm4 \n\t" /* mm1 = ReconPtr2 */
" movq %%mm0, %%mm2 \n\t" /* dup to prepare for up conversion */
" movq %%mm1, %%mm3 \n\t" /* dup to prepare for up conversion */
" movq %%mm4, %%mm5 \n\t" /* dup to prepare for up conversion */
/* convert from UINT8 to INT16 */
" punpcklbw %%mm7, %%mm0 \n\t" /* mm0 = INT16(FiltPtr) */
" punpcklbw %%mm7, %%mm1 \n\t" /* mm1 = INT16(ReconPtr1) */
" punpcklbw %%mm7, %%mm4 \n\t" /* mm1 = INT16(ReconPtr2) */
" punpckhbw %%mm7, %%mm2 \n\t" /* mm2 = INT16(FiltPtr) */
" punpckhbw %%mm7, %%mm3 \n\t" /* mm3 = INT16(ReconPtr1) */
" punpckhbw %%mm7, %%mm5 \n\t" /* mm3 = INT16(ReconPtr2) */
/* average ReconPtr1 and ReconPtr2 */
" paddw %%mm4, %%mm1 \n\t" /* mm1 = ReconPtr1 + ReconPtr2 */
" paddw %%mm5, %%mm3 \n\t" /* mm3 = ReconPtr1 + ReconPtr2 */
" psrlw $1, %%mm1 \n\t" /* mm1 = (ReconPtr1 + ReconPtr2) / 2 */
" psrlw $1, %%mm3 \n\t" /* mm3 = (ReconPtr1 + ReconPtr2) / 2 */
" psubw %%mm1, %%mm0 \n\t" /* mm0 = FiltPtr - ((ReconPtr1 + ReconPtr2) / 2) */
" psubw %%mm3, %%mm2 \n\t" /* mm2 = FiltPtr - ((ReconPtr1 + ReconPtr2) / 2) */
" movq %%mm0, (%3) \n\t" /* write answer out */
" movq %%mm2, 8(%3) \n\t" /* write answer out */
/* Increment pointers */
" add $16, %3 \n\t"
" add %4, %0 \n\t"
" add %5, %1 \n\t"
" add %5, %2 \n\t"
".endr \n\t"
: "+r" (FiltPtr),
"+r" (ReconPtr1),
"+r" (ReconPtr2),
"+r" (DctInputPtr)
: "r" ((ogg_uint64_t)PixelsPerLine),
"r" ((ogg_uint64_t)ReconPixelsPerLine)
: "memory"
);
}
static ogg_uint32_t intra8x8_err__mmx (unsigned char *DataPtr, ogg_uint32_t Stride)
{
ogg_uint64_t XSum;
ogg_uint64_t XXSum;
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm5, %%mm5 \n\t"
" pxor %%mm6, %%mm6 \n\t"
" pxor %%mm7, %%mm7 \n\t"
" mov $8, %%rdi \n\t"
"1: \n\t"
" movq (%2), %%mm0 \n\t" /* take 8 bytes */
" movq %%mm0, %%mm2 \n\t"
" punpcklbw %%mm6, %%mm0 \n\t"
" punpckhbw %%mm6, %%mm2 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" paddw %%mm2, %%mm5 \n\t"
" pmaddwd %%mm0, %%mm0 \n\t"
" pmaddwd %%mm2, %%mm2 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" paddd %%mm2, %%mm7 \n\t"
" add %3, %2 \n\t" /* Inc pointer into src data */
" dec %%rdi \n\t"
" jnz 1b \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $32, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $16, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movd %%mm5, %%rdi \n\t"
" movsx %%di, %%rdi \n\t"
" mov %%rdi, %0 \n\t"
" movq %%mm7, %%mm0 \n\t"
" psrlq $32, %%mm7 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" movd %%mm7, %1 \n\t"
: "=r" (XSum),
"=r" (XXSum),
"+r" (DataPtr)
: "r" ((ogg_uint64_t)Stride)
: "rdi", "memory"
);
/* Compute population variance as mis-match metric. */
return (( (XXSum<<6) - XSum*XSum ) );
}
static ogg_uint32_t inter8x8_err__mmx (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr, ogg_uint32_t RefStride)
{
ogg_uint64_t XSum;
ogg_uint64_t XXSum;
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm5, %%mm5 \n\t"
" pxor %%mm6, %%mm6 \n\t"
" pxor %%mm7, %%mm7 \n\t"
" mov $8, %%rdi \n\t"
"1: \n\t"
" movq (%2), %%mm0 \n\t" /* take 8 bytes */
" movq (%3), %%mm1 \n\t"
" movq %%mm0, %%mm2 \n\t"
" movq %%mm1, %%mm3 \n\t"
" punpcklbw %%mm6, %%mm0 \n\t"
" punpcklbw %%mm6, %%mm1 \n\t"
" punpckhbw %%mm6, %%mm2 \n\t"
" punpckhbw %%mm6, %%mm3 \n\t"
" psubsw %%mm1, %%mm0 \n\t"
" psubsw %%mm3, %%mm2 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" paddw %%mm2, %%mm5 \n\t"
" pmaddwd %%mm0, %%mm0 \n\t"
" pmaddwd %%mm2, %%mm2 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" paddd %%mm2, %%mm7 \n\t"
" add %4, %2 \n\t" /* Inc pointer into src data */
" add %5, %3 \n\t" /* Inc pointer into ref data */
" dec %%rdi \n\t"
" jnz 1b \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $32, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $16, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movd %%mm5, %%rdi \n\t"
" movsx %%di, %%rdi \n\t"
" mov %%rdi, %0 \n\t"
" movq %%mm7, %%mm0 \n\t"
" psrlq $32, %%mm7 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" movd %%mm7, %1 \n\t"
: "=m" (XSum),
"=m" (XXSum),
"+r" (SrcData),
"+r" (RefDataPtr)
: "r" ((ogg_uint64_t)SrcStride),
"r" ((ogg_uint64_t)RefStride)
: "rdi", "memory"
);
/* Compute and return population variance as mis-match metric. */
return (( (XXSum<<6) - XSum*XSum ));
}
static void restore_fpu (void)
{
__asm__ __volatile__ (
" emms \n\t"
);
}
void dsp_mmx_init(DspFunctions *funcs)
{
funcs->restore_fpu = restore_fpu;
funcs->sub8x8 = sub8x8__mmx;
funcs->sub8x8_128 = sub8x8_128__mmx;
funcs->sub8x8avg2 = sub8x8avg2__mmx;
funcs->intra8x8_err = intra8x8_err__mmx;
funcs->inter8x8_err = inter8x8_err__mmx;
}
#endif /* USE_ASM */

View file

@ -1,323 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: dsp_mmxext.c 15397 2008-10-14 02:06:24Z tterribe $
********************************************************************/
#include <stdlib.h>
#include "../codec_internal.h"
#include "../dsp.h"
#if defined(USE_ASM)
typedef unsigned long long ogg_uint64_t;
static ogg_uint32_t sad8x8__mmxext (unsigned char *ptr1, ogg_uint32_t stride1,
unsigned char *ptr2, ogg_uint32_t stride2)
{
ogg_uint32_t DiffVal;
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm7, %%mm7 \n\t" /* mm7 contains the result */
".rept 7 \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t"
" psadbw %%mm1, %%mm0 \n\t"
" add %3, %1 \n\t" /* Inc pointer into the new data */
" paddw %%mm0, %%mm7 \n\t" /* accumulate difference... */
" add %4, %2 \n\t" /* Inc pointer into ref data */
".endr \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t"
" psadbw %%mm1, %%mm0 \n\t"
" paddw %%mm0, %%mm7 \n\t" /* accumulate difference... */
" movd %%mm7, %0 \n\t"
: "=r" (DiffVal),
"+r" (ptr1),
"+r" (ptr2)
: "r" ((ogg_uint64_t)stride1),
"r" ((ogg_uint64_t)stride2)
: "memory"
);
return DiffVal;
}
static ogg_uint32_t sad8x8_thres__mmxext (unsigned char *ptr1, ogg_uint32_t stride1,
unsigned char *ptr2, ogg_uint32_t stride2,
ogg_uint32_t thres)
{
ogg_uint32_t DiffVal;
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm7, %%mm7 \n\t" /* mm7 contains the result */
".rept 8 \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t"
" psadbw %%mm1, %%mm0 \n\t"
" add %3, %1 \n\t" /* Inc pointer into the new data */
" paddw %%mm0, %%mm7 \n\t" /* accumulate difference... */
" add %4, %2 \n\t" /* Inc pointer into ref data */
".endr \n\t"
" movd %%mm7, %0 \n\t"
: "=r" (DiffVal),
"+r" (ptr1),
"+r" (ptr2)
: "r" ((ogg_uint64_t)stride1),
"r" ((ogg_uint64_t)stride2)
: "memory"
);
return DiffVal;
}
static ogg_uint32_t sad8x8_xy2_thres__mmxext (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr1,
unsigned char *RefDataPtr2, ogg_uint32_t RefStride,
ogg_uint32_t thres)
{
ogg_uint32_t DiffVal;
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm7, %%mm7 \n\t" /* mm7 contains the result */
".rept 8 \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t"
" movq (%3), %%mm2 \n\t"
" pavgb %%mm2, %%mm1 \n\t"
" psadbw %%mm1, %%mm0 \n\t"
" add %4, %1 \n\t" /* Inc pointer into the new data */
" paddw %%mm0, %%mm7 \n\t" /* accumulate difference... */
" add %5, %2 \n\t" /* Inc pointer into ref data */
" add %5, %3 \n\t" /* Inc pointer into ref data */
".endr \n\t"
" movd %%mm7, %0 \n\t"
: "=m" (DiffVal),
"+r" (SrcData),
"+r" (RefDataPtr1),
"+r" (RefDataPtr2)
: "r" ((ogg_uint64_t)SrcStride),
"r" ((ogg_uint64_t)RefStride)
: "memory"
);
return DiffVal;
}
static ogg_uint32_t row_sad8__mmxext (unsigned char *Src1, unsigned char *Src2)
{
ogg_uint32_t MaxSad;
__asm__ __volatile__ (
" .balign 16 \n\t"
" movd (%1), %%mm0 \n\t"
" movd (%2), %%mm1 \n\t"
" psadbw %%mm0, %%mm1 \n\t"
" movd 4(%1), %%mm2 \n\t"
" movd 4(%2), %%mm3 \n\t"
" psadbw %%mm2, %%mm3 \n\t"
" pmaxsw %%mm1, %%mm3 \n\t"
" movd %%mm3, %0 \n\t"
" andl $0xffff, %0 \n\t"
: "=m" (MaxSad),
"+r" (Src1),
"+r" (Src2)
:
: "memory"
);
return MaxSad;
}
static ogg_uint32_t col_sad8x8__mmxext (unsigned char *Src1, unsigned char *Src2,
ogg_uint32_t stride)
{
ogg_uint32_t MaxSad;
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm3, %%mm3 \n\t" /* zero out mm3 for unpack */
" pxor %%mm4, %%mm4 \n\t" /* mm4 low sum */
" pxor %%mm5, %%mm5 \n\t" /* mm5 high sum */
" pxor %%mm6, %%mm6 \n\t" /* mm6 low sum */
" pxor %%mm7, %%mm7 \n\t" /* mm7 high sum */
" mov $4, %%rdi \n\t" /* 4 rows */
"1: \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t" /* take 8 bytes */
" movq %%mm0, %%mm2 \n\t"
" psubusb %%mm1, %%mm0 \n\t" /* A - B */
" psubusb %%mm2, %%mm1 \n\t" /* B - A */
" por %%mm1, %%mm0 \n\t" /* and or gives abs difference */
" movq %%mm0, %%mm1 \n\t"
" punpcklbw %%mm3, %%mm0 \n\t" /* unpack to higher precision for accumulation */
" paddw %%mm0, %%mm4 \n\t" /* accumulate difference... */
" punpckhbw %%mm3, %%mm1 \n\t" /* unpack high four bytes to higher precision */
" paddw %%mm1, %%mm5 \n\t" /* accumulate difference... */
" add %3, %1 \n\t" /* Inc pointer into the new data */
" add %3, %2 \n\t" /* Inc pointer into the new data */
" dec %%rdi \n\t"
" jnz 1b \n\t"
" mov $4, %%rdi \n\t" /* 4 rows */
"2: \n\t"
" movq (%1), %%mm0 \n\t" /* take 8 bytes */
" movq (%2), %%mm1 \n\t" /* take 8 bytes */
" movq %%mm0, %%mm2 \n\t"
" psubusb %%mm1, %%mm0 \n\t" /* A - B */
" psubusb %%mm2, %%mm1 \n\t" /* B - A */
" por %%mm1, %%mm0 \n\t" /* and or gives abs difference */
" movq %%mm0, %%mm1 \n\t"
" punpcklbw %%mm3, %%mm0 \n\t" /* unpack to higher precision for accumulation */
" paddw %%mm0, %%mm6 \n\t" /* accumulate difference... */
" punpckhbw %%mm3, %%mm1 \n\t" /* unpack high four bytes to higher precision */
" paddw %%mm1, %%mm7 \n\t" /* accumulate difference... */
" add %3, %1 \n\t" /* Inc pointer into the new data */
" add %3, %2 \n\t" /* Inc pointer into the new data */
" dec %%rdi \n\t"
" jnz 2b \n\t"
" pmaxsw %%mm6, %%mm7 \n\t"
" pmaxsw %%mm4, %%mm5 \n\t"
" pmaxsw %%mm5, %%mm7 \n\t"
" movq %%mm7, %%mm6 \n\t"
" psrlq $32, %%mm6 \n\t"
" pmaxsw %%mm6, %%mm7 \n\t"
" movq %%mm7, %%mm6 \n\t"
" psrlq $16, %%mm6 \n\t"
" pmaxsw %%mm6, %%mm7 \n\t"
" movd %%mm7, %0 \n\t"
" andl $0xffff, %0 \n\t"
: "=r" (MaxSad),
"+r" (Src1),
"+r" (Src2)
: "r" ((ogg_uint64_t)stride)
: "memory", "rdi"
);
return MaxSad;
}
static ogg_uint32_t inter8x8_err_xy2__mmxext (unsigned char *SrcData, ogg_uint32_t SrcStride,
unsigned char *RefDataPtr1,
unsigned char *RefDataPtr2, ogg_uint32_t RefStride)
{
ogg_uint64_t XSum;
ogg_uint64_t XXSum;
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm4, %%mm4 \n\t"
" pxor %%mm5, %%mm5 \n\t"
" pxor %%mm6, %%mm6 \n\t"
" pxor %%mm7, %%mm7 \n\t"
" mov $8, %%rdi \n\t"
"1: \n\t"
" movq (%2), %%mm0 \n\t" /* take 8 bytes */
" movq (%3), %%mm2 \n\t"
" movq (%4), %%mm1 \n\t" /* take average of mm2 and mm1 */
" pavgb %%mm2, %%mm1 \n\t"
" movq %%mm0, %%mm2 \n\t"
" movq %%mm1, %%mm3 \n\t"
" punpcklbw %%mm6, %%mm0 \n\t"
" punpcklbw %%mm4, %%mm1 \n\t"
" punpckhbw %%mm6, %%mm2 \n\t"
" punpckhbw %%mm4, %%mm3 \n\t"
" psubsw %%mm1, %%mm0 \n\t"
" psubsw %%mm3, %%mm2 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" paddw %%mm2, %%mm5 \n\t"
" pmaddwd %%mm0, %%mm0 \n\t"
" pmaddwd %%mm2, %%mm2 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" paddd %%mm2, %%mm7 \n\t"
" add %5, %2 \n\t" /* Inc pointer into src data */
" add %6, %3 \n\t" /* Inc pointer into ref data */
" add %6, %4 \n\t" /* Inc pointer into ref data */
" dec %%rdi \n\t"
" jnz 1b \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $32, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movq %%mm5, %%mm0 \n\t"
" psrlq $16, %%mm5 \n\t"
" paddw %%mm0, %%mm5 \n\t"
" movd %%mm5, %%edi \n\t"
" movsx %%di, %%edi \n\t"
" movl %%edi, %0 \n\t"
" movq %%mm7, %%mm0 \n\t"
" psrlq $32, %%mm7 \n\t"
" paddd %%mm0, %%mm7 \n\t"
" movd %%mm7, %1 \n\t"
: "=m" (XSum),
"=m" (XXSum),
"+r" (SrcData),
"+r" (RefDataPtr1),
"+r" (RefDataPtr2)
: "r" ((ogg_uint64_t)SrcStride),
"r" ((ogg_uint64_t)RefStride)
: "rdi", "memory"
);
/* Compute and return population variance as mis-match metric. */
return (( (XXSum<<6) - XSum*XSum ));
}
void dsp_mmxext_init(DspFunctions *funcs)
{
funcs->row_sad8 = row_sad8__mmxext;
funcs->col_sad8x8 = col_sad8x8__mmxext;
funcs->sad8x8 = sad8x8__mmxext;
funcs->sad8x8_thres = sad8x8_thres__mmxext;
funcs->sad8x8_xy2_thres = sad8x8_xy2_thres__mmxext;
funcs->inter8x8_err_xy2 = inter8x8_err_xy2__mmxext;
}
#endif /* USE_ASM */

View file

@ -1,342 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 1999-2006 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************/
/* mmx fdct implementation for x86_64 */
/* $Id: fdct_mmx.c 15397 2008-10-14 02:06:24Z tterribe $ */
#include "theora/theora.h"
#include "../codec_internal.h"
#include "../dsp.h"
#if defined(USE_ASM)
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC1S7 = 0x0fb15fb15fb15fb15LL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC2S6 = 0x0ec83ec83ec83ec83LL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC3S5 = 0x0d4dbd4dbd4dbd4dbLL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC4S4 = 0x0b505b505b505b505LL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC5S3 = 0x08e3a8e3a8e3a8e3aLL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC6S2 = 0x061f861f861f861f8LL;
static const __attribute__ ((aligned(8),used)) ogg_int64_t xC7S1 = 0x031f131f131f131f1LL;
#if defined(__MINGW32__) || defined(__CYGWIN__) || \
defined(__OS2__) || (defined (__OpenBSD__) && !defined(__ELF__))
# define M(a) "_" #a
#else
# define M(a) #a
#endif
/* execute stage 1 of forward DCT */
#define Fdct_mmx(ip0,ip1,ip2,ip3,ip4,ip5,ip6,ip7,temp) \
" movq " #ip0 ", %%mm0 \n\t" \
" movq " #ip1 ", %%mm1 \n\t" \
" movq " #ip3 ", %%mm2 \n\t" \
" movq " #ip5 ", %%mm3 \n\t" \
" movq %%mm0, %%mm4 \n\t" \
" movq %%mm1, %%mm5 \n\t" \
" movq %%mm2, %%mm6 \n\t" \
" movq %%mm3, %%mm7 \n\t" \
\
" paddsw " #ip7 ", %%mm0 \n\t" /* mm0 = ip0 + ip7 = is07 */ \
" paddsw " #ip2 ", %%mm1 \n\t" /* mm1 = ip1 + ip2 = is12 */ \
" paddsw " #ip4 ", %%mm2 \n\t" /* mm2 = ip3 + ip4 = is34 */ \
" paddsw " #ip6 ", %%mm3 \n\t" /* mm3 = ip5 + ip6 = is56 */ \
" psubsw " #ip7 ", %%mm4 \n\t" /* mm4 = ip0 - ip7 = id07 */ \
" psubsw " #ip2 ", %%mm5 \n\t" /* mm5 = ip1 - ip2 = id12 */ \
\
" psubsw %%mm2, %%mm0 \n\t" /* mm0 = is07 - is34 */ \
\
" paddsw %%mm2, %%mm2 \n\t" \
\
" psubsw " #ip4 ", %%mm6 \n\t" /* mm6 = ip3 - ip4 = id34 */ \
\
" paddsw %%mm0, %%mm2 \n\t" /* mm2 = is07 + is34 = is0734 */ \
" psubsw %%mm3, %%mm1 \n\t" /* mm1 = is12 - is56 */ \
" movq %%mm0," #temp " \n\t" /* Save is07 - is34 to free mm0; */ \
" paddsw %%mm3, %%mm3 \n\t" \
" paddsw %%mm1, %%mm3 \n\t" /* mm3 = is12 + 1s56 = is1256 */ \
\
" psubsw " #ip6 ", %%mm7 \n\t" /* mm7 = ip5 - ip6 = id56 */ \
/* ------------------------------------------------------------------- */ \
" psubsw %%mm7, %%mm5 \n\t" /* mm5 = id12 - id56 */ \
" paddsw %%mm7, %%mm7 \n\t" \
" paddsw %%mm5, %%mm7 \n\t" /* mm7 = id12 + id56 */ \
/* ------------------------------------------------------------------- */ \
" psubsw %%mm3, %%mm2 \n\t" /* mm2 = is0734 - is1256 */ \
" paddsw %%mm3, %%mm3 \n\t" \
\
" movq %%mm2, %%mm0 \n\t" /* make a copy */ \
" paddsw %%mm2, %%mm3 \n\t" /* mm3 = is0734 + is1256 */ \
\
" pmulhw %[xC4S4], %%mm0 \n\t" /* mm0 = xC4S4 * ( is0734 - is1256 ) - ( is0734 - is1256 ) */ \
" paddw %%mm2, %%mm0 \n\t" /* mm0 = xC4S4 * ( is0734 - is1256 ) */ \
" psrlw $15, %%mm2 \n\t" \
" paddw %%mm2, %%mm0 \n\t" /* Truncate mm0, now it is op[4] */ \
\
" movq %%mm3, %%mm2 \n\t" \
" movq %%mm0," #ip4 " \n\t" /* save ip4, now mm0,mm2 are free */ \
\
" movq %%mm3, %%mm0 \n\t" \
" pmulhw %[xC4S4], %%mm3 \n\t" /* mm3 = xC4S4 * ( is0734 +is1256 ) - ( is0734 +is1256 ) */ \
\
" psrlw $15, %%mm2 \n\t" \
" paddw %%mm0, %%mm3 \n\t" /* mm3 = xC4S4 * ( is0734 +is1256 ) */ \
" paddw %%mm2, %%mm3 \n\t" /* Truncate mm3, now it is op[0] */ \
\
" movq %%mm3," #ip0 " \n\t" \
/* ------------------------------------------------------------------- */ \
" movq " #temp ", %%mm3 \n\t" /* mm3 = irot_input_y */ \
" pmulhw %[xC2S6], %%mm3 \n\t" /* mm3 = xC2S6 * irot_input_y - irot_input_y */ \
\
" movq " #temp ", %%mm2 \n\t" \
" movq %%mm2, %%mm0 \n\t" \
\
" psrlw $15, %%mm2 \n\t" /* mm3 = xC2S6 * irot_input_y */ \
" paddw %%mm0, %%mm3 \n\t" \
\
" paddw %%mm2, %%mm3 \n\t" /* Truncated */ \
" movq %%mm5, %%mm0 \n\t" \
\
" movq %%mm5, %%mm2 \n\t" \
" pmulhw %[xC6S2], %%mm0 \n\t" /* mm0 = xC6S2 * irot_input_x */ \
\
" psrlw $15, %%mm2 \n\t" \
" paddw %%mm2, %%mm0 \n\t" /* Truncated */ \
\
" paddsw %%mm0, %%mm3 \n\t" /* ip[2] */ \
" movq %%mm3," #ip2 " \n\t" /* Save ip2 */ \
\
" movq %%mm5, %%mm0 \n\t" \
" movq %%mm5, %%mm2 \n\t" \
\
" pmulhw %[xC2S6], %%mm5 \n\t" /* mm5 = xC2S6 * irot_input_x - irot_input_x */ \
" psrlw $15, %%mm2 \n\t" \
\
" movq " #temp ", %%mm3 \n\t" \
" paddw %%mm0, %%mm5 \n\t" /* mm5 = xC2S6 * irot_input_x */ \
\
" paddw %%mm2, %%mm5 \n\t" /* Truncated */ \
" movq %%mm3, %%mm2 \n\t" \
\
" pmulhw %[xC6S2], %%mm3 \n\t" /* mm3 = xC6S2 * irot_input_y */ \
" psrlw $15, %%mm2 \n\t" \
\
" paddw %%mm2, %%mm3 \n\t" /* Truncated */ \
" psubsw %%mm5, %%mm3 \n\t" \
\
" movq %%mm3," #ip6 " \n\t" \
/* ------------------------------------------------------------------- */ \
" movq %[xC4S4], %%mm0 \n\t" \
" movq %%mm1, %%mm2 \n\t" \
" movq %%mm1, %%mm3 \n\t" \
\
" pmulhw %%mm0, %%mm1 \n\t" /* mm0 = xC4S4 * ( is12 - is56 ) - ( is12 - is56 ) */ \
" psrlw $15, %%mm2 \n\t" \
\
" paddw %%mm3, %%mm1 \n\t" /* mm0 = xC4S4 * ( is12 - is56 ) */ \
" paddw %%mm2, %%mm1 \n\t" /* Truncate mm1, now it is icommon_product1 */ \
\
" movq %%mm7, %%mm2 \n\t" \
" movq %%mm7, %%mm3 \n\t" \
\
" pmulhw %%mm0, %%mm7 \n\t" /* mm7 = xC4S4 * ( id12 + id56 ) - ( id12 + id56 ) */ \
" psrlw $15, %%mm2 \n\t" \
\
" paddw %%mm3, %%mm7 \n\t" /* mm7 = xC4S4 * ( id12 + id56 ) */ \
" paddw %%mm2, %%mm7 \n\t" /* Truncate mm7, now it is icommon_product2 */ \
/* ------------------------------------------------------------------- */ \
" pxor %%mm0, %%mm0 \n\t" /* Clear mm0 */ \
" psubsw %%mm6, %%mm0 \n\t" /* mm0 = - id34 */ \
\
" psubsw %%mm7, %%mm0 \n\t" /* mm0 = - ( id34 + idcommon_product2 ) */ \
" paddsw %%mm6, %%mm6 \n\t" \
" paddsw %%mm0, %%mm6 \n\t" /* mm6 = id34 - icommon_product2 */ \
\
" psubsw %%mm1, %%mm4 \n\t" /* mm4 = id07 - icommon_product1 */ \
" paddsw %%mm1, %%mm1 \n\t" \
" paddsw %%mm4, %%mm1 \n\t" /* mm1 = id07 + icommon_product1 */ \
/* ------------------------------------------------------------------- */ \
" movq %[xC1S7], %%mm7 \n\t" \
" movq %%mm1, %%mm2 \n\t" \
\
" movq %%mm1, %%mm3 \n\t" \
" pmulhw %%mm7, %%mm1 \n\t" /* mm1 = xC1S7 * irot_input_x - irot_input_x */ \
\
" movq %[xC7S1], %%mm7 \n\t" \
" psrlw $15, %%mm2 \n\t" \
\
" paddw %%mm3, %%mm1 \n\t" /* mm1 = xC1S7 * irot_input_x */ \
" paddw %%mm2, %%mm1 \n\t" /* Trucated */ \
\
" pmulhw %%mm7, %%mm3 \n\t" /* mm3 = xC7S1 * irot_input_x */ \
" paddw %%mm2, %%mm3 \n\t" /* Truncated */ \
\
" movq %%mm0, %%mm5 \n\t" \
" movq %%mm0, %%mm2 \n\t" \
\
" movq %[xC1S7], %%mm7 \n\t" \
" pmulhw %%mm7, %%mm0 \n\t" /* mm0 = xC1S7 * irot_input_y - irot_input_y */ \
\
" movq %[xC7S1], %%mm7 \n\t" \
" psrlw $15, %%mm2 \n\t" \
\
" paddw %%mm5, %%mm0 \n\t" /* mm0 = xC1S7 * irot_input_y */ \
" paddw %%mm2, %%mm0 \n\t" /* Truncated */ \
\
" pmulhw %%mm7, %%mm5 \n\t" /* mm5 = xC7S1 * irot_input_y */ \
" paddw %%mm2, %%mm5 \n\t" /* Truncated */ \
\
" psubsw %%mm5, %%mm1 \n\t" /* mm1 = xC1S7 * irot_input_x - xC7S1 * irot_input_y = ip1 */ \
" paddsw %%mm0, %%mm3 \n\t" /* mm3 = xC7S1 * irot_input_x - xC1S7 * irot_input_y = ip7 */ \
\
" movq %%mm1," #ip1 " \n\t" \
" movq %%mm3," #ip7 " \n\t" \
/* ------------------------------------------------------------------- */ \
" movq %[xC3S5], %%mm0 \n\t" \
" movq %[xC5S3], %%mm1 \n\t" \
\
" movq %%mm6, %%mm5 \n\t" \
" movq %%mm6, %%mm7 \n\t" \
\
" movq %%mm4, %%mm2 \n\t" \
" movq %%mm4, %%mm3 \n\t" \
\
" pmulhw %%mm0, %%mm4 \n\t" /* mm4 = xC3S5 * irot_input_x - irot_input_x */ \
" pmulhw %%mm1, %%mm6 \n\t" /* mm6 = xC5S3 * irot_input_y - irot_input_y */ \
\
" psrlw $15, %%mm2 \n\t" \
" psrlw $15, %%mm5 \n\t" \
\
" paddw %%mm3, %%mm4 \n\t" /* mm4 = xC3S5 * irot_input_x */ \
" paddw %%mm7, %%mm6 \n\t" /* mm6 = xC5S3 * irot_input_y */ \
\
" paddw %%mm2, %%mm4 \n\t" /* Truncated */ \
" paddw %%mm5, %%mm6 \n\t" /* Truncated */ \
\
" psubsw %%mm6, %%mm4 \n\t" /* ip3 */ \
" movq %%mm4," #ip3 " \n\t" \
\
" movq %%mm3, %%mm4 \n\t" \
" movq %%mm7, %%mm6 \n\t" \
\
" pmulhw %%mm1, %%mm3 \n\t" /* mm3 = xC5S3 * irot_input_x - irot_input_x */ \
" pmulhw %%mm0, %%mm7 \n\t" /* mm7 = xC3S5 * irot_input_y - irot_input_y */ \
\
" paddw %%mm2, %%mm4 \n\t" \
" paddw %%mm5, %%mm6 \n\t" \
\
" paddw %%mm4, %%mm3 \n\t" /* mm3 = xC5S3 * irot_input_x */ \
" paddw %%mm6, %%mm7 \n\t" /* mm7 = xC3S5 * irot_input_y */ \
\
" paddw %%mm7, %%mm3 \n\t" /* ip5 */ \
" movq %%mm3," #ip5 " \n\t"
#define Transpose_mmx(ip0,ip1,ip2,ip3,ip4,ip5,ip6,ip7, \
op0,op1,op2,op3,op4,op5,op6,op7) \
" movq " #ip0 ", %%mm0 \n\t" /* mm0 = a0 a1 a2 a3 */ \
" movq " #ip4 ", %%mm4 \n\t" /* mm4 = e4 e5 e6 e7 */ \
" movq " #ip1 ", %%mm1 \n\t" /* mm1 = b0 b1 b2 b3 */ \
" movq " #ip5 ", %%mm5 \n\t" /* mm5 = f4 f5 f6 f7 */ \
" movq " #ip2 ", %%mm2 \n\t" /* mm2 = c0 c1 c2 c3 */ \
" movq " #ip6 ", %%mm6 \n\t" /* mm6 = g4 g5 g6 g7 */ \
" movq " #ip3 ", %%mm3 \n\t" /* mm3 = d0 d1 d2 d3 */ \
" movq %%mm1," #op1 " \n\t" /* save b0 b1 b2 b3 */ \
" movq " #ip7 ", %%mm7 \n\t" /* mm7 = h0 h1 h2 h3 */ \
/* Transpose 2x8 block */ \
" movq %%mm4, %%mm1 \n\t" /* mm1 = e3 e2 e1 e0 */ \
" punpcklwd %%mm5, %%mm4 \n\t" /* mm4 = f1 e1 f0 e0 */ \
" movq %%mm0," #op0 " \n\t" /* save a3 a2 a1 a0 */ \
" punpckhwd %%mm5, %%mm1 \n\t" /* mm1 = f3 e3 f2 e2 */ \
" movq %%mm6, %%mm0 \n\t" /* mm0 = g3 g2 g1 g0 */ \
" punpcklwd %%mm7, %%mm6 \n\t" /* mm6 = h1 g1 h0 g0 */ \
" movq %%mm4, %%mm5 \n\t" /* mm5 = f1 e1 f0 e0 */ \
" punpckldq %%mm6, %%mm4 \n\t" /* mm4 = h0 g0 f0 e0 = MM4 */ \
" punpckhdq %%mm6, %%mm5 \n\t" /* mm5 = h1 g1 f1 e1 = MM5 */ \
" movq %%mm1, %%mm6 \n\t" /* mm6 = f3 e3 f2 e2 */ \
" movq %%mm4," #op4 " \n\t" \
" punpckhwd %%mm7, %%mm0 \n\t" /* mm0 = h3 g3 h2 g2 */ \
" movq %%mm5," #op5 " \n\t" \
" punpckhdq %%mm0, %%mm6 \n\t" /* mm6 = h3 g3 f3 e3 = MM7 */ \
" movq " #op0 ", %%mm4 \n\t" /* mm4 = a3 a2 a1 a0 */ \
" punpckldq %%mm0, %%mm1 \n\t" /* mm1 = h2 g2 f2 e2 = MM6 */ \
" movq " #op1 ", %%mm5 \n\t" /* mm5 = b3 b2 b1 b0 */ \
" movq %%mm4, %%mm0 \n\t" /* mm0 = a3 a2 a1 a0 */ \
" movq %%mm6," #op7 " \n\t" \
" punpcklwd %%mm5, %%mm0 \n\t" /* mm0 = b1 a1 b0 a0 */ \
" movq %%mm1," #op6 " \n\t" \
" punpckhwd %%mm5, %%mm4 \n\t" /* mm4 = b3 a3 b2 a2 */ \
" movq %%mm2, %%mm5 \n\t" /* mm5 = c3 c2 c1 c0 */ \
" punpcklwd %%mm3, %%mm2 \n\t" /* mm2 = d1 c1 d0 c0 */ \
" movq %%mm0, %%mm1 \n\t" /* mm1 = b1 a1 b0 a0 */ \
" punpckldq %%mm2, %%mm0 \n\t" /* mm0 = d0 c0 b0 a0 = MM0 */ \
" punpckhdq %%mm2, %%mm1 \n\t" /* mm1 = d1 c1 b1 a1 = MM1 */ \
" movq %%mm4, %%mm2 \n\t" /* mm2 = b3 a3 b2 a2 */ \
" movq %%mm0," #op0 " \n\t" \
" punpckhwd %%mm3, %%mm5 \n\t" /* mm5 = d3 c3 d2 c2 */ \
" movq %%mm1," #op1 " \n\t" \
" punpckhdq %%mm5, %%mm4 \n\t" /* mm4 = d3 c3 b3 a3 = MM3 */ \
" punpckldq %%mm5, %%mm2 \n\t" /* mm2 = d2 c2 b2 a2 = MM2 */ \
" movq %%mm4," #op3 " \n\t" \
" movq %%mm2," #op2 " \n\t"
/* This performs a 2D Forward DCT on an 8x8 block with short
coefficients. We try to do the truncation to match the C
version. */
static void fdct_short__mmx ( ogg_int16_t *InputData, ogg_int16_t *OutputData)
{
ogg_int16_t __attribute__((aligned(8))) temp[8*8];
__asm__ __volatile__ (
" .balign 16 \n\t"
/*
* Input data is an 8x8 block. To make processing of the data more efficent
* we will transpose the block of data to two 4x8 blocks???
*/
Transpose_mmx ( (%0), 16(%0), 32(%0), 48(%0), 8(%0), 24(%0), 40(%0), 56(%0),
(%1), 16(%1), 32(%1), 48(%1), 8(%1), 24(%1), 40(%1), 56(%1))
Fdct_mmx ( (%1), 16(%1), 32(%1), 48(%1), 8(%1), 24(%1), 40(%1), 56(%1), (%2))
Transpose_mmx (64(%0), 80(%0), 96(%0),112(%0), 72(%0), 88(%0),104(%0),120(%0),
64(%1), 80(%1), 96(%1),112(%1), 72(%1), 88(%1),104(%1),120(%1))
Fdct_mmx (64(%1), 80(%1), 96(%1),112(%1), 72(%1), 88(%1),104(%1),120(%1), (%2))
Transpose_mmx ( 0(%1), 16(%1), 32(%1), 48(%1), 64(%1), 80(%1), 96(%1),112(%1),
0(%1), 16(%1), 32(%1), 48(%1), 64(%1), 80(%1), 96(%1),112(%1))
Fdct_mmx ( 0(%1), 16(%1), 32(%1), 48(%1), 64(%1), 80(%1), 96(%1),112(%1), (%2))
Transpose_mmx ( 8(%1), 24(%1), 40(%1), 56(%1), 72(%1), 88(%1),104(%1),120(%1),
8(%1), 24(%1), 40(%1), 56(%1), 72(%1), 88(%1),104(%1),120(%1))
Fdct_mmx ( 8(%1), 24(%1), 40(%1), 56(%1), 72(%1), 88(%1),104(%1),120(%1), (%2))
" emms \n\t"
: "+r" (InputData),
"+r" (OutputData)
: "r" (temp),
[xC1S7] "m" (xC1S7), /* gcc 3.1+ allows named asm parameters */
[xC2S6] "m" (xC2S6),
[xC3S5] "m" (xC3S5),
[xC4S4] "m" (xC4S4),
[xC5S3] "m" (xC5S3),
[xC6S2] "m" (xC6S2),
[xC7S1] "m" (xC7S1)
: "memory"
);
}
/* install our implementation in the function table */
void dsp_mmx_fdct_init(DspFunctions *funcs)
{
funcs->fdct_short = fdct_short__mmx;
}
#endif /* USE_ASM */

View file

@ -1,27 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: idct_mmx.c 15397 2008-10-14 02:06:24Z tterribe $
********************************************************************/
#include "../codec_internal.h"
#if defined(USE_ASM)
/* nothing implemented right now */
void dsp_mmx_idct_init(DspFunctions *funcs)
{
}
#endif /* USE_ASM */

View file

@ -1,184 +0,0 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: recon_mmx.c 15397 2008-10-14 02:06:24Z tterribe $
********************************************************************/
#include "../codec_internal.h"
#if defined(USE_ASM)
typedef unsigned long long ogg_uint64_t;
static const __attribute__ ((aligned(8),used)) ogg_int64_t V128 = 0x8080808080808080LL;
static void copy8x8__mmx (unsigned char *src,
unsigned char *dest,
ogg_uint32_t stride)
{
__asm__ __volatile__ (
" .balign 16 \n\t"
" lea (%2, %2, 2), %%rdi \n\t"
" movq (%1), %%mm0 \n\t"
" movq (%1, %2), %%mm1 \n\t"
" movq (%1, %2, 2), %%mm2 \n\t"
" movq (%1, %%rdi), %%mm3 \n\t"
" lea (%1, %2, 4), %1 \n\t"
" movq %%mm0, (%0) \n\t"
" movq %%mm1, (%0, %2) \n\t"
" movq %%mm2, (%0, %2, 2) \n\t"
" movq %%mm3, (%0, %%rdi) \n\t"
" lea (%0, %2, 4), %0 \n\t"
" movq (%1), %%mm0 \n\t"
" movq (%1, %2), %%mm1 \n\t"
" movq (%1, %2, 2), %%mm2 \n\t"
" movq (%1, %%rdi), %%mm3 \n\t"
" movq %%mm0, (%0) \n\t"
" movq %%mm1, (%0, %2) \n\t"
" movq %%mm2, (%0, %2, 2) \n\t"
" movq %%mm3, (%0, %%rdi) \n\t"
: "+a" (dest)
: "c" (src),
"d" ((ogg_uint64_t)stride)
: "memory", "rdi"
);
}
static void recon_intra8x8__mmx (unsigned char *ReconPtr, ogg_int16_t *ChangePtr,
ogg_uint32_t LineStep)
{
__asm__ __volatile__ (
" .balign 16 \n\t"
" movq %[V128], %%mm0 \n\t" /* Set mm0 to 0x8080808080808080 */
" lea 128(%1), %%rdi \n\t" /* Endpoint in input buffer */
"1: \n\t"
" movq (%1), %%mm2 \n\t" /* First four input values */
" packsswb 8(%1), %%mm2 \n\t" /* pack with next(high) four values */
" por %%mm0, %%mm0 \n\t"
" pxor %%mm0, %%mm2 \n\t" /* Convert result to unsigned (same as add 128) */
" lea 16(%1), %1 \n\t" /* Step source buffer */
" cmp %%rdi, %1 \n\t" /* are we done */
" movq %%mm2, (%0) \n\t" /* store results */
" lea (%0, %2), %0 \n\t" /* Step output buffer */
" jc 1b \n\t" /* Loop back if we are not done */
: "+r" (ReconPtr)
: "r" (ChangePtr),
"r" ((ogg_uint64_t)LineStep),
[V128] "m" (V128)
: "memory", "rdi"
);
}
static void recon_inter8x8__mmx (unsigned char *ReconPtr, unsigned char *RefPtr,
ogg_int16_t *ChangePtr, ogg_uint32_t LineStep)
{
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm0, %%mm0 \n\t"
" lea 128(%1), %%rdi \n\t"
"1: \n\t"
" movq (%2), %%mm2 \n\t" /* (+3 misaligned) 8 reference pixels */
" movq (%1), %%mm4 \n\t" /* first 4 changes */
" movq %%mm2, %%mm3 \n\t"
" movq 8(%1), %%mm5 \n\t" /* last 4 changes */
" punpcklbw %%mm0, %%mm2 \n\t" /* turn first 4 refs into positive 16-bit #s */
" paddsw %%mm4, %%mm2 \n\t" /* add in first 4 changes */
" punpckhbw %%mm0, %%mm3 \n\t" /* turn last 4 refs into positive 16-bit #s */
" paddsw %%mm5, %%mm3 \n\t" /* add in last 4 changes */
" add %3, %2 \n\t" /* next row of reference pixels */
" packuswb %%mm3, %%mm2 \n\t" /* pack result to unsigned 8-bit values */
" lea 16(%1), %1 \n\t" /* next row of changes */
" cmp %%rdi, %1 \n\t" /* are we done? */
" movq %%mm2, (%0) \n\t" /* store result */
" lea (%0, %3), %0 \n\t" /* next row of output */
" jc 1b \n\t"
: "+r" (ReconPtr)
: "r" (ChangePtr),
"r" (RefPtr),
"r" ((ogg_uint64_t)LineStep)
: "memory", "rdi"
);
}
static void recon_inter8x8_half__mmx (unsigned char *ReconPtr, unsigned char *RefPtr1,
unsigned char *RefPtr2, ogg_int16_t *ChangePtr,
ogg_uint32_t LineStep)
{
__asm__ __volatile__ (
" .balign 16 \n\t"
" pxor %%mm0, %%mm0 \n\t"
" lea 128(%1), %%rdi \n\t"
"1: \n\t"
" movq (%2), %%mm2 \n\t" /* (+3 misaligned) 8 reference pixels */
" movq (%3), %%mm4 \n\t" /* (+3 misaligned) 8 reference pixels */
" movq %%mm2, %%mm3 \n\t"
" punpcklbw %%mm0, %%mm2 \n\t" /* mm2 = start ref1 as positive 16-bit #s */
" movq %%mm4, %%mm5 \n\t"
" movq (%1), %%mm6 \n\t" /* first 4 changes */
" punpckhbw %%mm0, %%mm3 \n\t" /* mm3 = end ref1 as positive 16-bit #s */
" movq 8(%1), %%mm7 \n\t" /* last 4 changes */
" punpcklbw %%mm0, %%mm4 \n\t" /* mm4 = start ref2 as positive 16-bit #s */
" punpckhbw %%mm0, %%mm5 \n\t" /* mm5 = end ref2 as positive 16-bit #s */
" paddw %%mm4, %%mm2 \n\t" /* mm2 = start (ref1 + ref2) */
" paddw %%mm5, %%mm3 \n\t" /* mm3 = end (ref1 + ref2) */
" psrlw $1, %%mm2 \n\t" /* mm2 = start (ref1 + ref2)/2 */
" psrlw $1, %%mm3 \n\t" /* mm3 = end (ref1 + ref2)/2 */
" paddw %%mm6, %%mm2 \n\t" /* add changes to start */
" paddw %%mm7, %%mm3 \n\t" /* add changes to end */
" lea 16(%1), %1 \n\t" /* next row of changes */
" packuswb %%mm3, %%mm2 \n\t" /* pack start|end to unsigned 8-bit */
" add %4, %2 \n\t" /* next row of reference pixels */
" add %4, %3 \n\t" /* next row of reference pixels */
" movq %%mm2, (%0) \n\t" /* store result */
" add %4, %0 \n\t" /* next row of output */
" cmp %%rdi, %1 \n\t" /* are we done? */
" jc 1b \n\t"
: "+r" (ReconPtr)
: "r" (ChangePtr),
"r" (RefPtr1),
"r" (RefPtr2),
"r" ((ogg_uint64_t)LineStep)
: "memory", "rdi"
);
}
void dsp_mmx_recon_init(DspFunctions *funcs)
{
funcs->copy8x8 = copy8x8__mmx;
funcs->recon_intra8x8 = recon_intra8x8__mmx;
funcs->recon_inter8x8 = recon_inter8x8__mmx;
funcs->recon_inter8x8_half = recon_inter8x8_half__mmx;
}
#endif /* USE_ASM */

View file

@ -0,0 +1,168 @@
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include "apiwrapper.h"
#include "encint.h"
#include "theora/theoraenc.h"
static void th_enc_api_clear(th_api_wrapper *_api){
if(_api->encode)th_encode_free(_api->encode);
memset(_api,0,sizeof(*_api));
}
static void theora_encode_clear(theora_state *_te){
if(_te->i!=NULL)theora_info_clear(_te->i);
memset(_te,0,sizeof(*_te));
}
static int theora_encode_control(theora_state *_te,int _req,
void *_buf,size_t _buf_sz){
return th_encode_ctl(((th_api_wrapper *)_te->i->codec_setup)->encode,
_req,_buf,_buf_sz);
}
static ogg_int64_t theora_encode_granule_frame(theora_state *_te,
ogg_int64_t _gp){
return th_granule_frame(((th_api_wrapper *)_te->i->codec_setup)->encode,_gp);
}
static double theora_encode_granule_time(theora_state *_te,ogg_int64_t _gp){
return th_granule_time(((th_api_wrapper *)_te->i->codec_setup)->encode,_gp);
}
static const oc_state_dispatch_vtable OC_ENC_DISPATCH_VTBL={
(oc_state_clear_func)theora_encode_clear,
(oc_state_control_func)theora_encode_control,
(oc_state_granule_frame_func)theora_encode_granule_frame,
(oc_state_granule_time_func)theora_encode_granule_time,
};
int theora_encode_init(theora_state *_te,theora_info *_ci){
th_api_info *apiinfo;
th_info info;
ogg_uint32_t keyframe_frequency_force;
/*Allocate our own combined API wrapper/theora_info struct.
We put them both in one malloc'd block so that when the API wrapper is
freed, the info struct goes with it.
This avoids having to figure out whether or not we need to free the info
struct in either theora_info_clear() or theora_clear().*/
apiinfo=(th_api_info *)_ogg_malloc(sizeof(*apiinfo));
if(apiinfo==NULL)return TH_EFAULT;
/*Make our own copy of the info struct, since its lifetime should be
independent of the one we were passed in.*/
*&apiinfo->info=*_ci;
oc_theora_info2th_info(&info,_ci);
apiinfo->api.encode=th_encode_alloc(&info);
if(apiinfo->api.encode==NULL){
_ogg_free(apiinfo);
return OC_EINVAL;
}
apiinfo->api.clear=(oc_setup_clear_func)th_enc_api_clear;
/*Provide entry points for ABI compatibility with old decoder shared libs.*/
_te->internal_encode=(void *)&OC_ENC_DISPATCH_VTBL;
_te->internal_decode=NULL;
_te->granulepos=0;
_te->i=&apiinfo->info;
_te->i->codec_setup=&apiinfo->api;
/*Set the precise requested keyframe frequency.*/
keyframe_frequency_force=_ci->keyframe_auto_p?
_ci->keyframe_frequency_force:_ci->keyframe_frequency;
th_encode_ctl(apiinfo->api.encode,
TH_ENCCTL_SET_KEYFRAME_FREQUENCY_FORCE,
&keyframe_frequency_force,sizeof(keyframe_frequency_force));
/*TODO: Additional codec setup using the extra fields in theora_info.*/
return 0;
}
int theora_encode_YUVin(theora_state *_te,yuv_buffer *_yuv){
th_api_wrapper *api;
th_ycbcr_buffer buf;
int ret;
api=(th_api_wrapper *)_te->i->codec_setup;
buf[0].width=_yuv->y_width;
buf[0].height=_yuv->y_height;
buf[0].stride=_yuv->y_stride;
buf[0].data=_yuv->y;
buf[1].width=_yuv->uv_width;
buf[1].height=_yuv->uv_height;
buf[1].stride=_yuv->uv_stride;
buf[1].data=_yuv->u;
buf[2].width=_yuv->uv_width;
buf[2].height=_yuv->uv_height;
buf[2].stride=_yuv->uv_stride;
buf[2].data=_yuv->v;
ret=th_encode_ycbcr_in(api->encode,buf);
if(ret<0)return ret;
_te->granulepos=api->encode->state.granpos;
return ret;
}
int theora_encode_packetout(theora_state *_te,int _last_p,ogg_packet *_op){
th_api_wrapper *api;
api=(th_api_wrapper *)_te->i->codec_setup;
return th_encode_packetout(api->encode,_last_p,_op);
}
int theora_encode_header(theora_state *_te,ogg_packet *_op){
oc_enc_ctx *enc;
th_api_wrapper *api;
int ret;
api=(th_api_wrapper *)_te->i->codec_setup;
enc=api->encode;
/*If we've already started encoding, fail.*/
if(enc->packet_state>OC_PACKET_EMPTY||enc->state.granpos!=0){
return TH_EINVAL;
}
/*Reset the state to make sure we output an info packet.*/
enc->packet_state=OC_PACKET_INFO_HDR;
ret=th_encode_flushheader(api->encode,NULL,_op);
return ret>=0?0:ret;
}
int theora_encode_comment(theora_comment *_tc,ogg_packet *_op){
oggpack_buffer opb;
void *buf;
int packet_state;
int ret;
packet_state=OC_PACKET_COMMENT_HDR;
oggpackB_writeinit(&opb);
ret=oc_state_flushheader(NULL,&packet_state,&opb,NULL,NULL,
th_version_string(),(th_comment *)_tc,_op);
if(ret>=0){
/*The oggpack_buffer's lifetime ends with this function, so we have to
copy out the packet contents.
Presumably the application knows it is supposed to free this.
This part works nothing like the Vorbis API, and the documentation on it
has been wrong for some time, claiming libtheora owned the memory.*/
buf=_ogg_malloc(_op->bytes);
if(buf==NULL){
_op->packet=NULL;
ret=TH_EFAULT;
}
else{
memcpy(buf,_op->packet,_op->bytes);
_op->packet=buf;
ret=0;
}
}
oggpack_writeclear(&opb);
return ret;
}
int theora_encode_tables(theora_state *_te,ogg_packet *_op){
oc_enc_ctx *enc;
th_api_wrapper *api;
int ret;
api=(th_api_wrapper *)_te->i->codec_setup;
enc=api->encode;
/*If we've already started encoding, fail.*/
if(enc->packet_state>OC_PACKET_EMPTY||enc->state.granpos!=0){
return TH_EINVAL;
}
/*Reset the state to make sure we output a setup packet.*/
enc->packet_state=OC_PACKET_SETUP_HDR;
ret=th_encode_flushheader(api->encode,NULL,_op);
return ret>=0?0:ret;
}

View file

@ -0,0 +1,388 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: encfrag.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include "encint.h"
void oc_enc_frag_sub(const oc_enc_ctx *_enc,ogg_int16_t _diff[64],
const unsigned char *_src,const unsigned char *_ref,int _ystride){
(*_enc->opt_vtable.frag_sub)(_diff,_src,_ref,_ystride);
}
void oc_enc_frag_sub_c(ogg_int16_t _diff[64],const unsigned char *_src,
const unsigned char *_ref,int _ystride){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++)_diff[i*8+j]=(ogg_int16_t)(_src[j]-_ref[j]);
_src+=_ystride;
_ref+=_ystride;
}
}
void oc_enc_frag_sub_128(const oc_enc_ctx *_enc,ogg_int16_t _diff[64],
const unsigned char *_src,int _ystride){
(*_enc->opt_vtable.frag_sub_128)(_diff,_src,_ystride);
}
void oc_enc_frag_sub_128_c(ogg_int16_t *_diff,
const unsigned char *_src,int _ystride){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++)_diff[i*8+j]=(ogg_int16_t)(_src[j]-128);
_src+=_ystride;
}
}
unsigned oc_enc_frag_sad(const oc_enc_ctx *_enc,const unsigned char *_x,
const unsigned char *_y,int _ystride){
return (*_enc->opt_vtable.frag_sad)(_x,_y,_ystride);
}
unsigned oc_enc_frag_sad_c(const unsigned char *_src,
const unsigned char *_ref,int _ystride){
unsigned sad;
int i;
sad=0;
for(i=8;i-->0;){
int j;
for(j=0;j<8;j++)sad+=abs(_src[j]-_ref[j]);
_src+=_ystride;
_ref+=_ystride;
}
return sad;
}
unsigned oc_enc_frag_sad_thresh(const oc_enc_ctx *_enc,
const unsigned char *_src,const unsigned char *_ref,int _ystride,
unsigned _thresh){
return (*_enc->opt_vtable.frag_sad_thresh)(_src,_ref,_ystride,_thresh);
}
unsigned oc_enc_frag_sad_thresh_c(const unsigned char *_src,
const unsigned char *_ref,int _ystride,unsigned _thresh){
unsigned sad;
int i;
sad=0;
for(i=8;i-->0;){
int j;
for(j=0;j<8;j++)sad+=abs(_src[j]-_ref[j]);
if(sad>_thresh)break;
_src+=_ystride;
_ref+=_ystride;
}
return sad;
}
unsigned oc_enc_frag_sad2_thresh(const oc_enc_ctx *_enc,
const unsigned char *_src,const unsigned char *_ref1,
const unsigned char *_ref2,int _ystride,unsigned _thresh){
return (*_enc->opt_vtable.frag_sad2_thresh)(_src,_ref1,_ref2,_ystride,
_thresh);
}
unsigned oc_enc_frag_sad2_thresh_c(const unsigned char *_src,
const unsigned char *_ref1,const unsigned char *_ref2,int _ystride,
unsigned _thresh){
unsigned sad;
int i;
sad=0;
for(i=8;i-->0;){
int j;
for(j=0;j<8;j++)sad+=abs(_src[j]-(_ref1[j]+_ref2[j]>>1));
if(sad>_thresh)break;
_src+=_ystride;
_ref1+=_ystride;
_ref2+=_ystride;
}
return sad;
}
static void oc_diff_hadamard(ogg_int16_t _buf[64],const unsigned char *_src,
const unsigned char *_ref,int _ystride){
int i;
for(i=0;i<8;i++){
int t0;
int t1;
int t2;
int t3;
int t4;
int t5;
int t6;
int t7;
int r;
/*Hadamard stage 1:*/
t0=_src[0]-_ref[0]+_src[4]-_ref[4];
t4=_src[0]-_ref[0]-_src[4]+_ref[4];
t1=_src[1]-_ref[1]+_src[5]-_ref[5];
t5=_src[1]-_ref[1]-_src[5]+_ref[5];
t2=_src[2]-_ref[2]+_src[6]-_ref[6];
t6=_src[2]-_ref[2]-_src[6]+_ref[6];
t3=_src[3]-_ref[3]+_src[7]-_ref[7];
t7=_src[3]-_ref[3]-_src[7]+_ref[7];
/*Hadamard stage 2:*/
r=t0;
t0+=t2;
t2=r-t2;
r=t1;
t1+=t3;
t3=r-t3;
r=t4;
t4+=t6;
t6=r-t6;
r=t5;
t5+=t7;
t7=r-t7;
/*Hadamard stage 3:*/
_buf[0*8+i]=(ogg_int16_t)(t0+t1);
_buf[1*8+i]=(ogg_int16_t)(t0-t1);
_buf[2*8+i]=(ogg_int16_t)(t2+t3);
_buf[3*8+i]=(ogg_int16_t)(t2-t3);
_buf[4*8+i]=(ogg_int16_t)(t4+t5);
_buf[5*8+i]=(ogg_int16_t)(t4-t5);
_buf[6*8+i]=(ogg_int16_t)(t6+t7);
_buf[7*8+i]=(ogg_int16_t)(t6-t7);
_src+=_ystride;
_ref+=_ystride;
}
}
static void oc_diff_hadamard2(ogg_int16_t _buf[64],const unsigned char *_src,
const unsigned char *_ref1,const unsigned char *_ref2,int _ystride){
int i;
for(i=0;i<8;i++){
int t0;
int t1;
int t2;
int t3;
int t4;
int t5;
int t6;
int t7;
int r;
/*Hadamard stage 1:*/
r=_ref1[0]+_ref2[0]>>1;
t4=_ref1[4]+_ref2[4]>>1;
t0=_src[0]-r+_src[4]-t4;
t4=_src[0]-r-_src[4]+t4;
r=_ref1[1]+_ref2[1]>>1;
t5=_ref1[5]+_ref2[5]>>1;
t1=_src[1]-r+_src[5]-t5;
t5=_src[1]-r-_src[5]+t5;
r=_ref1[2]+_ref2[2]>>1;
t6=_ref1[6]+_ref2[6]>>1;
t2=_src[2]-r+_src[6]-t6;
t6=_src[2]-r-_src[6]+t6;
r=_ref1[3]+_ref2[3]>>1;
t7=_ref1[7]+_ref2[7]>>1;
t3=_src[3]-r+_src[7]-t7;
t7=_src[3]-r-_src[7]+t7;
/*Hadamard stage 2:*/
r=t0;
t0+=t2;
t2=r-t2;
r=t1;
t1+=t3;
t3=r-t3;
r=t4;
t4+=t6;
t6=r-t6;
r=t5;
t5+=t7;
t7=r-t7;
/*Hadamard stage 3:*/
_buf[0*8+i]=(ogg_int16_t)(t0+t1);
_buf[1*8+i]=(ogg_int16_t)(t0-t1);
_buf[2*8+i]=(ogg_int16_t)(t2+t3);
_buf[3*8+i]=(ogg_int16_t)(t2-t3);
_buf[4*8+i]=(ogg_int16_t)(t4+t5);
_buf[5*8+i]=(ogg_int16_t)(t4-t5);
_buf[6*8+i]=(ogg_int16_t)(t6+t7);
_buf[7*8+i]=(ogg_int16_t)(t6-t7);
_src+=_ystride;
_ref1+=_ystride;
_ref2+=_ystride;
}
}
static void oc_intra_hadamard(ogg_int16_t _buf[64],const unsigned char *_src,
int _ystride){
int i;
for(i=0;i<8;i++){
int t0;
int t1;
int t2;
int t3;
int t4;
int t5;
int t6;
int t7;
int r;
/*Hadamard stage 1:*/
t0=_src[0]+_src[4];
t4=_src[0]-_src[4];
t1=_src[1]+_src[5];
t5=_src[1]-_src[5];
t2=_src[2]+_src[6];
t6=_src[2]-_src[6];
t3=_src[3]+_src[7];
t7=_src[3]-_src[7];
/*Hadamard stage 2:*/
r=t0;
t0+=t2;
t2=r-t2;
r=t1;
t1+=t3;
t3=r-t3;
r=t4;
t4+=t6;
t6=r-t6;
r=t5;
t5+=t7;
t7=r-t7;
/*Hadamard stage 3:*/
_buf[0*8+i]=(ogg_int16_t)(t0+t1);
_buf[1*8+i]=(ogg_int16_t)(t0-t1);
_buf[2*8+i]=(ogg_int16_t)(t2+t3);
_buf[3*8+i]=(ogg_int16_t)(t2-t3);
_buf[4*8+i]=(ogg_int16_t)(t4+t5);
_buf[5*8+i]=(ogg_int16_t)(t4-t5);
_buf[6*8+i]=(ogg_int16_t)(t6+t7);
_buf[7*8+i]=(ogg_int16_t)(t6-t7);
_src+=_ystride;
}
}
unsigned oc_hadamard_sad_thresh(const ogg_int16_t _buf[64],unsigned _thresh){
unsigned sad;
int t0;
int t1;
int t2;
int t3;
int t4;
int t5;
int t6;
int t7;
int r;
int i;
sad=0;
for(i=0;i<8;i++){
/*Hadamard stage 1:*/
t0=_buf[i*8+0]+_buf[i*8+4];
t4=_buf[i*8+0]-_buf[i*8+4];
t1=_buf[i*8+1]+_buf[i*8+5];
t5=_buf[i*8+1]-_buf[i*8+5];
t2=_buf[i*8+2]+_buf[i*8+6];
t6=_buf[i*8+2]-_buf[i*8+6];
t3=_buf[i*8+3]+_buf[i*8+7];
t7=_buf[i*8+3]-_buf[i*8+7];
/*Hadamard stage 2:*/
r=t0;
t0+=t2;
t2=r-t2;
r=t1;
t1+=t3;
t3=r-t3;
r=t4;
t4+=t6;
t6=r-t6;
r=t5;
t5+=t7;
t7=r-t7;
/*Hadamard stage 3:*/
r=abs(t0+t1);
r+=abs(t0-t1);
r+=abs(t2+t3);
r+=abs(t2-t3);
r+=abs(t4+t5);
r+=abs(t4-t5);
r+=abs(t6+t7);
r+=abs(t6-t7);
sad+=r;
if(sad>_thresh)break;
}
return sad;
}
unsigned oc_enc_frag_satd_thresh(const oc_enc_ctx *_enc,
const unsigned char *_src,const unsigned char *_ref,int _ystride,
unsigned _thresh){
return (*_enc->opt_vtable.frag_satd_thresh)(_src,_ref,_ystride,_thresh);
}
unsigned oc_enc_frag_satd_thresh_c(const unsigned char *_src,
const unsigned char *_ref,int _ystride,unsigned _thresh){
ogg_int16_t buf[64];
oc_diff_hadamard(buf,_src,_ref,_ystride);
return oc_hadamard_sad_thresh(buf,_thresh);
}
unsigned oc_enc_frag_satd2_thresh(const oc_enc_ctx *_enc,
const unsigned char *_src,const unsigned char *_ref1,
const unsigned char *_ref2,int _ystride,unsigned _thresh){
return (*_enc->opt_vtable.frag_satd2_thresh)(_src,_ref1,_ref2,_ystride,
_thresh);
}
unsigned oc_enc_frag_satd2_thresh_c(const unsigned char *_src,
const unsigned char *_ref1,const unsigned char *_ref2,int _ystride,
unsigned _thresh){
ogg_int16_t buf[64];
oc_diff_hadamard2(buf,_src,_ref1,_ref2,_ystride);
return oc_hadamard_sad_thresh(buf,_thresh);
}
unsigned oc_enc_frag_intra_satd(const oc_enc_ctx *_enc,
const unsigned char *_src,int _ystride){
return (*_enc->opt_vtable.frag_intra_satd)(_src,_ystride);
}
unsigned oc_enc_frag_intra_satd_c(const unsigned char *_src,int _ystride){
ogg_int16_t buf[64];
oc_intra_hadamard(buf,_src,_ystride);
return oc_hadamard_sad_thresh(buf,UINT_MAX)
-abs(buf[0]+buf[1]+buf[2]+buf[3]+buf[4]+buf[5]+buf[6]+buf[7]);
}
void oc_enc_frag_copy2(const oc_enc_ctx *_enc,unsigned char *_dst,
const unsigned char *_src1,const unsigned char *_src2,int _ystride){
(*_enc->opt_vtable.frag_copy2)(_dst,_src1,_src2,_ystride);
}
void oc_enc_frag_copy2_c(unsigned char *_dst,
const unsigned char *_src1,const unsigned char *_src2,int _ystride){
int i;
int j;
for(i=8;i-->0;){
for(j=0;j<8;j++)_dst[j]=_src1[j]+_src2[j]>>1;
_dst+=_ystride;
_src1+=_ystride;
_src2+=_ystride;
}
}
void oc_enc_frag_recon_intra(const oc_enc_ctx *_enc,
unsigned char *_dst,int _ystride,const ogg_int16_t _residue[64]){
(*_enc->opt_vtable.frag_recon_intra)(_dst,_ystride,_residue);
}
void oc_enc_frag_recon_inter(const oc_enc_ctx *_enc,unsigned char *_dst,
const unsigned char *_src,int _ystride,const ogg_int16_t _residue[64]){
(*_enc->opt_vtable.frag_recon_inter)(_dst,_src,_ystride,_residue);
}

View file

@ -0,0 +1,121 @@
#include <stdlib.h>
#include <string.h>
#include "internal.h"
#include "enquant.h"
#include "huffenc.h"
/*Packs a series of octets from a given byte array into the pack buffer.
_opb: The pack buffer to store the octets in.
_buf: The byte array containing the bytes to pack.
_len: The number of octets to pack.*/
static void oc_pack_octets(oggpack_buffer *_opb,const char *_buf,int _len){
int i;
for(i=0;i<_len;i++)oggpackB_write(_opb,_buf[i],8);
}
int oc_state_flushheader(oc_theora_state *_state,int *_packet_state,
oggpack_buffer *_opb,const th_quant_info *_qinfo,
const th_huff_code _codes[TH_NHUFFMAN_TABLES][TH_NDCT_TOKENS],
const char *_vendor,th_comment *_tc,ogg_packet *_op){
unsigned char *packet;
int b_o_s;
if(_op==NULL)return TH_EFAULT;
switch(*_packet_state){
/*Codec info header.*/
case OC_PACKET_INFO_HDR:{
if(_state==NULL)return TH_EFAULT;
oggpackB_reset(_opb);
/*Mark this packet as the info header.*/
oggpackB_write(_opb,0x80,8);
/*Write the codec string.*/
oc_pack_octets(_opb,"theora",6);
/*Write the codec bitstream version.*/
oggpackB_write(_opb,TH_VERSION_MAJOR,8);
oggpackB_write(_opb,TH_VERSION_MINOR,8);
oggpackB_write(_opb,TH_VERSION_SUB,8);
/*Describe the encoded frame.*/
oggpackB_write(_opb,_state->info.frame_width>>4,16);
oggpackB_write(_opb,_state->info.frame_height>>4,16);
oggpackB_write(_opb,_state->info.pic_width,24);
oggpackB_write(_opb,_state->info.pic_height,24);
oggpackB_write(_opb,_state->info.pic_x,8);
oggpackB_write(_opb,_state->info.pic_y,8);
oggpackB_write(_opb,_state->info.fps_numerator,32);
oggpackB_write(_opb,_state->info.fps_denominator,32);
oggpackB_write(_opb,_state->info.aspect_numerator,24);
oggpackB_write(_opb,_state->info.aspect_denominator,24);
oggpackB_write(_opb,_state->info.colorspace,8);
oggpackB_write(_opb,_state->info.target_bitrate,24);
oggpackB_write(_opb,_state->info.quality,6);
oggpackB_write(_opb,_state->info.keyframe_granule_shift,5);
oggpackB_write(_opb,_state->info.pixel_fmt,2);
/*Spare configuration bits.*/
oggpackB_write(_opb,0,3);
b_o_s=1;
}break;
/*Comment header.*/
case OC_PACKET_COMMENT_HDR:{
int vendor_len;
int i;
if(_tc==NULL)return TH_EFAULT;
vendor_len=strlen(_vendor);
oggpackB_reset(_opb);
/*Mark this packet as the comment header.*/
oggpackB_write(_opb,0x81,8);
/*Write the codec string.*/
oc_pack_octets(_opb,"theora",6);
/*Write the vendor string.*/
oggpack_write(_opb,vendor_len,32);
oc_pack_octets(_opb,_vendor,vendor_len);
oggpack_write(_opb,_tc->comments,32);
for(i=0;i<_tc->comments;i++){
if(_tc->user_comments[i]!=NULL){
oggpack_write(_opb,_tc->comment_lengths[i],32);
oc_pack_octets(_opb,_tc->user_comments[i],_tc->comment_lengths[i]);
}
else oggpack_write(_opb,0,32);
}
b_o_s=0;
}break;
/*Codec setup header.*/
case OC_PACKET_SETUP_HDR:{
int ret;
oggpackB_reset(_opb);
/*Mark this packet as the setup header.*/
oggpackB_write(_opb,0x82,8);
/*Write the codec string.*/
oc_pack_octets(_opb,"theora",6);
/*Write the quantizer tables.*/
oc_quant_params_pack(_opb,_qinfo);
/*Write the huffman codes.*/
ret=oc_huff_codes_pack(_opb,_codes);
/*This should never happen, because we validate the tables when they
are set.
If you see, it's a good chance memory is being corrupted.*/
if(ret<0)return ret;
b_o_s=0;
}break;
/*No more headers to emit.*/
default:return 0;
}
/*This is kind of fugly: we hand the user a buffer which they do not own.
We will overwrite it when the next packet is output, so the user better be
done with it by then.
Vorbis is little better: it hands back buffers that it will free the next
time the headers are requested, or when the encoder is cleared.
Hopefully libogg2 will make this much cleaner.*/
packet=oggpackB_get_buffer(_opb);
/*If there's no packet, malloc failed while writing.*/
if(packet==NULL)return TH_EFAULT;
_op->packet=packet;
_op->bytes=oggpackB_bytes(_opb);
_op->b_o_s=b_o_s;
_op->e_o_s=0;
_op->granulepos=0;
_op->packetno=*_packet_state+3;
return ++(*_packet_state)+3;
}

View file

@ -0,0 +1,493 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: encint.h 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
#if !defined(_encint_H)
# define _encint_H (1)
# if defined(HAVE_CONFIG_H)
# include "config.h"
# endif
# include "theora/theoraenc.h"
# include "internal.h"
# include "ocintrin.h"
# include "mathops.h"
# include "enquant.h"
# include "huffenc.h"
/*# define OC_COLLECT_METRICS*/
typedef oc_mv oc_mv2[2];
typedef struct oc_enc_opt_vtable oc_enc_opt_vtable;
typedef struct oc_mb_enc_info oc_mb_enc_info;
typedef struct oc_mode_scheme_chooser oc_mode_scheme_chooser;
typedef struct oc_iir_filter oc_iir_filter;
typedef struct oc_frame_metrics oc_frame_metrics;
typedef struct oc_rc_state oc_rc_state;
typedef struct th_enc_ctx oc_enc_ctx;
typedef struct oc_token_checkpoint oc_token_checkpoint;
/*Constants for the packet-out state machine specific to the encoder.*/
/*Next packet to emit: Data packet, but none are ready yet.*/
#define OC_PACKET_EMPTY (0)
/*Next packet to emit: Data packet, and one is ready.*/
#define OC_PACKET_READY (1)
/*All features enabled.*/
#define OC_SP_LEVEL_SLOW (0)
/*Enable early skip.*/
#define OC_SP_LEVEL_EARLY_SKIP (1)
/*Disable motion compensation.*/
#define OC_SP_LEVEL_NOMC (2)
/*Maximum valid speed level.*/
#define OC_SP_LEVEL_MAX (2)
/*The bits used for each of the MB mode codebooks.*/
extern const unsigned char OC_MODE_BITS[2][OC_NMODES];
/*The bits used for each of the MV codebooks.*/
extern const unsigned char OC_MV_BITS[2][64];
/*The minimum value that can be stored in a SB run for each codeword.
The last entry is the upper bound on the length of a single SB run.*/
extern const ogg_uint16_t OC_SB_RUN_VAL_MIN[8];
/*The bits used for each SB run codeword.*/
extern const unsigned char OC_SB_RUN_CODE_NBITS[7];
/*The bits used for each block run length (starting with 1).*/
extern const unsigned char OC_BLOCK_RUN_CODE_NBITS[30];
/*Encoder specific functions with accelerated variants.*/
struct oc_enc_opt_vtable{
unsigned (*frag_sad)(const unsigned char *_src,
const unsigned char *_ref,int _ystride);
unsigned (*frag_sad_thresh)(const unsigned char *_src,
const unsigned char *_ref,int _ystride,unsigned _thresh);
unsigned (*frag_sad2_thresh)(const unsigned char *_src,
const unsigned char *_ref1,const unsigned char *_ref2,int _ystride,
unsigned _thresh);
unsigned (*frag_satd_thresh)(const unsigned char *_src,
const unsigned char *_ref,int _ystride,unsigned _thresh);
unsigned (*frag_satd2_thresh)(const unsigned char *_src,
const unsigned char *_ref1,const unsigned char *_ref2,int _ystride,
unsigned _thresh);
unsigned (*frag_intra_satd)(const unsigned char *_src,int _ystride);
void (*frag_sub)(ogg_int16_t _diff[64],const unsigned char *_src,
const unsigned char *_ref,int _ystride);
void (*frag_sub_128)(ogg_int16_t _diff[64],
const unsigned char *_src,int _ystride);
void (*frag_copy2)(unsigned char *_dst,
const unsigned char *_src1,const unsigned char *_src2,int _ystride);
void (*frag_recon_intra)(unsigned char *_dst,int _ystride,
const ogg_int16_t _residue[64]);
void (*frag_recon_inter)(unsigned char *_dst,
const unsigned char *_src,int _ystride,const ogg_int16_t _residue[64]);
void (*fdct8x8)(ogg_int16_t _y[64],const ogg_int16_t _x[64]);
};
void oc_enc_vtable_init(oc_enc_ctx *_enc);
/*Encoder-specific macroblock information.*/
struct oc_mb_enc_info{
/*Neighboring macro blocks that have MVs available from the current frame.*/
unsigned cneighbors[4];
/*Neighboring macro blocks to use for MVs from the previous frame.*/
unsigned pneighbors[4];
/*The number of current-frame neighbors.*/
unsigned char ncneighbors;
/*The number of previous-frame neighbors.*/
unsigned char npneighbors;
/*Flags indicating which MB modes have been refined.*/
unsigned char refined;
/*Motion vectors for a macro block for the current frame and the
previous two frames.
Each is a set of 2 vectors against OC_FRAME_GOLD and OC_FRAME_PREV, which
can be used to estimate constant velocity and constant acceleration
predictors.
Uninitialized MVs are (0,0).*/
oc_mv2 analysis_mv[3];
/*Current unrefined analysis MVs.*/
oc_mv unref_mv[2];
/*Unrefined block MVs.*/
oc_mv block_mv[4];
/*Refined block MVs.*/
oc_mv ref_mv[4];
/*Minimum motion estimation error from the analysis stage.*/
ogg_uint16_t error[2];
/*MB error for half-pel refinement for each frame type.*/
unsigned satd[2];
/*Block error for half-pel refinement.*/
unsigned block_satd[4];
};
/*State machine to estimate the opportunity cost of coding a MB mode.*/
struct oc_mode_scheme_chooser{
/*Pointers to the a list containing the index of each mode in the mode
alphabet used by each scheme.
The first entry points to the dynamic scheme0_ranks, while the remaining 7
point to the constant entries stored in OC_MODE_SCHEMES.*/
const unsigned char *mode_ranks[8];
/*The ranks for each mode when coded with scheme 0.
These are optimized so that the more frequent modes have lower ranks.*/
unsigned char scheme0_ranks[OC_NMODES];
/*The list of modes, sorted in descending order of frequency, that
corresponds to the ranks above.*/
unsigned char scheme0_list[OC_NMODES];
/*The number of times each mode has been chosen so far.*/
int mode_counts[OC_NMODES];
/*The list of mode coding schemes, sorted in ascending order of bit cost.*/
unsigned char scheme_list[8];
/*The number of bits used by each mode coding scheme.*/
ptrdiff_t scheme_bits[8];
};
void oc_mode_scheme_chooser_init(oc_mode_scheme_chooser *_chooser);
/*A 2nd order low-pass Bessel follower.
We use this for rate control because it has fast reaction time, but is
critically damped.*/
struct oc_iir_filter{
ogg_int32_t c[2];
ogg_int64_t g;
ogg_int32_t x[2];
ogg_int32_t y[2];
};
/*The 2-pass metrics associated with a single frame.*/
struct oc_frame_metrics{
/*The log base 2 of the scale factor for this frame in Q24 format.*/
ogg_int32_t log_scale;
/*The number of application-requested duplicates of this frame.*/
unsigned dup_count:31;
/*The frame type from pass 1.*/
unsigned frame_type:1;
};
/*Rate control state information.*/
struct oc_rc_state{
/*The target average bits per frame.*/
ogg_int64_t bits_per_frame;
/*The current buffer fullness (bits available to be used).*/
ogg_int64_t fullness;
/*The target buffer fullness.
This is where we'd like to be by the last keyframe the appears in the next
buf_delay frames.*/
ogg_int64_t target;
/*The maximum buffer fullness (total size of the buffer).*/
ogg_int64_t max;
/*The log of the number of pixels in a frame in Q57 format.*/
ogg_int64_t log_npixels;
/*The exponent used in the rate model in Q8 format.*/
unsigned exp[2];
/*The number of frames to distribute the buffer usage over.*/
int buf_delay;
/*The total drop count from the previous frame.
This includes duplicates explicitly requested via the
TH_ENCCTL_SET_DUP_COUNT API as well as frames we chose to drop ourselves.*/
ogg_uint32_t prev_drop_count;
/*The log of an estimated scale factor used to obtain the real framerate, for
VFR sources or, e.g., 12 fps content doubled to 24 fps, etc.*/
ogg_int64_t log_drop_scale;
/*The log of estimated scale factor for the rate model in Q57 format.*/
ogg_int64_t log_scale[2];
/*The log of the target quantizer level in Q57 format.*/
ogg_int64_t log_qtarget;
/*Will we drop frames to meet bitrate target?*/
unsigned char drop_frames;
/*Do we respect the maximum buffer fullness?*/
unsigned char cap_overflow;
/*Can the reservoir go negative?*/
unsigned char cap_underflow;
/*Second-order lowpass filters to track scale and VFR.*/
oc_iir_filter scalefilter[2];
int inter_count;
int inter_delay;
int inter_delay_target;
oc_iir_filter vfrfilter;
/*Two-pass mode state.
0 => 1-pass encoding.
1 => 1st pass of 2-pass encoding.
2 => 2nd pass of 2-pass encoding.*/
int twopass;
/*Buffer for current frame metrics.*/
unsigned char twopass_buffer[48];
/*The number of bytes in the frame metrics buffer.
When 2-pass encoding is enabled, this is set to 0 after each frame is
submitted, and must be non-zero before the next frame will be accepted.*/
int twopass_buffer_bytes;
int twopass_buffer_fill;
/*Whether or not to force the next frame to be a keyframe.*/
unsigned char twopass_force_kf;
/*The metrics for the previous frame.*/
oc_frame_metrics prev_metrics;
/*The metrics for the current frame.*/
oc_frame_metrics cur_metrics;
/*The buffered metrics for future frames.*/
oc_frame_metrics *frame_metrics;
int nframe_metrics;
int cframe_metrics;
/*The index of the current frame in the circular metric buffer.*/
int frame_metrics_head;
/*The frame count of each type (keyframes, delta frames, and dup frames);
32 bits limits us to 2.268 years at 60 fps.*/
ogg_uint32_t frames_total[3];
/*The number of frames of each type yet to be processed.*/
ogg_uint32_t frames_left[3];
/*The sum of the scale values for each frame type.*/
ogg_int64_t scale_sum[2];
/*The start of the window over which the current scale sums are taken.*/
int scale_window0;
/*The end of the window over which the current scale sums are taken.*/
int scale_window_end;
/*The frame count of each type in the current 2-pass window; this does not
include dup frames.*/
int nframes[3];
/*The total accumulated estimation bias.*/
ogg_int64_t rate_bias;
};
void oc_rc_state_init(oc_rc_state *_rc,oc_enc_ctx *_enc);
void oc_rc_state_clear(oc_rc_state *_rc);
void oc_enc_rc_resize(oc_enc_ctx *_enc);
int oc_enc_select_qi(oc_enc_ctx *_enc,int _qti,int _clamp);
void oc_enc_calc_lambda(oc_enc_ctx *_enc,int _frame_type);
int oc_enc_update_rc_state(oc_enc_ctx *_enc,
long _bits,int _qti,int _qi,int _trial,int _droppable);
int oc_enc_rc_2pass_out(oc_enc_ctx *_enc,unsigned char **_buf);
int oc_enc_rc_2pass_in(oc_enc_ctx *_enc,unsigned char *_buf,size_t _bytes);
/*The internal encoder state.*/
struct th_enc_ctx{
/*Shared encoder/decoder state.*/
oc_theora_state state;
/*Buffer in which to assemble packets.*/
oggpack_buffer opb;
/*Encoder-specific macroblock information.*/
oc_mb_enc_info *mb_info;
/*DC coefficients after prediction.*/
ogg_int16_t *frag_dc;
/*The list of coded macro blocks, in coded order.*/
unsigned *coded_mbis;
/*The number of coded macro blocks.*/
size_t ncoded_mbis;
/*Whether or not packets are ready to be emitted.
This takes on negative values while there are remaining header packets to
be emitted, reaches 0 when the codec is ready for input, and becomes
positive when a frame has been processed and data packets are ready.*/
int packet_state;
/*The maximum distance between keyframes.*/
ogg_uint32_t keyframe_frequency_force;
/*The number of duplicates to produce for the next frame.*/
ogg_uint32_t dup_count;
/*The number of duplicates remaining to be emitted for the current frame.*/
ogg_uint32_t nqueued_dups;
/*The number of duplicates emitted for the last frame.*/
ogg_uint32_t prev_dup_count;
/*The current speed level.*/
int sp_level;
/*Whether or not VP3 compatibility mode has been enabled.*/
unsigned char vp3_compatible;
/*Whether or not any INTER frames have been coded.*/
unsigned char coded_inter_frame;
/*Whether or not previous frame was dropped.*/
unsigned char prevframe_dropped;
/*Stores most recently chosen Huffman tables for each frame type, DC and AC
coefficients, and luma and chroma tokens.
The actual Huffman table used for a given coefficient depends not only on
the choice made here, but also its index in the zig-zag ordering.*/
unsigned char huff_idxs[2][2][2];
/*Current count of bits used by each MV coding mode.*/
size_t mv_bits[2];
/*The mode scheme chooser for estimating mode coding costs.*/
oc_mode_scheme_chooser chooser;
/*The number of vertical super blocks in an MCU.*/
int mcu_nvsbs;
/*The SSD error for skipping each fragment in the current MCU.*/
unsigned *mcu_skip_ssd;
/*The DCT token lists for each coefficient and each plane.*/
unsigned char **dct_tokens[3];
/*The extra bits associated with each DCT token.*/
ogg_uint16_t **extra_bits[3];
/*The number of DCT tokens for each coefficient for each plane.*/
ptrdiff_t ndct_tokens[3][64];
/*Pending EOB runs for each coefficient for each plane.*/
ogg_uint16_t eob_run[3][64];
/*The offset of the first DCT token for each coefficient for each plane.*/
unsigned char dct_token_offs[3][64];
/*The last DC coefficient for each plane and reference frame.*/
int dc_pred_last[3][3];
#if defined(OC_COLLECT_METRICS)
/*Fragment SATD statistics for MB mode estimation metrics.*/
unsigned *frag_satd;
/*Fragment SSD statistics for MB mode estimation metrics.*/
unsigned *frag_ssd;
#endif
/*The R-D optimization parameter.*/
int lambda;
/*The huffman tables in use.*/
th_huff_code huff_codes[TH_NHUFFMAN_TABLES][TH_NDCT_TOKENS];
/*The quantization parameters in use.*/
th_quant_info qinfo;
oc_iquant *enquant_tables[64][3][2];
oc_iquant_table enquant_table_data[64][3][2];
/*An "average" quantizer for each quantizer type (INTRA or INTER) and qi
value.
This is used to paramterize the rate control decisions.
They are kept in the log domain to simplify later processing.
Keep in mind these are DCT domain quantizers, and so are scaled by an
additional factor of 4 from the pixel domain.*/
ogg_int64_t log_qavg[2][64];
/*The buffer state used to drive rate control.*/
oc_rc_state rc;
/*Table for encoder acceleration functions.*/
oc_enc_opt_vtable opt_vtable;
};
void oc_enc_analyze_intra(oc_enc_ctx *_enc,int _recode);
int oc_enc_analyze_inter(oc_enc_ctx *_enc,int _allow_keyframe,int _recode);
#if defined(OC_COLLECT_METRICS)
void oc_enc_mode_metrics_collect(oc_enc_ctx *_enc);
void oc_enc_mode_metrics_dump(oc_enc_ctx *_enc);
#endif
/*Perform fullpel motion search for a single MB against both reference frames.*/
void oc_mcenc_search(oc_enc_ctx *_enc,int _mbi);
/*Refine a MB MV for one frame.*/
void oc_mcenc_refine1mv(oc_enc_ctx *_enc,int _mbi,int _frame);
/*Refine the block MVs.*/
void oc_mcenc_refine4mv(oc_enc_ctx *_enc,int _mbi);
/*Used to rollback a tokenlog transaction when we retroactively decide to skip
a fragment.
A checkpoint is taken right before each token is added.*/
struct oc_token_checkpoint{
/*The color plane the token was added to.*/
unsigned char pli;
/*The zig-zag index the token was added to.*/
unsigned char zzi;
/*The outstanding EOB run count before the token was added.*/
ogg_uint16_t eob_run;
/*The token count before the token was added.*/
ptrdiff_t ndct_tokens;
};
void oc_enc_tokenize_start(oc_enc_ctx *_enc);
int oc_enc_tokenize_ac(oc_enc_ctx *_enc,int _pli,ptrdiff_t _fragi,
ogg_int16_t *_qdct,const ogg_uint16_t *_dequant,const ogg_int16_t *_dct,
int _zzi,oc_token_checkpoint **_stack,int _acmin);
void oc_enc_tokenlog_rollback(oc_enc_ctx *_enc,
const oc_token_checkpoint *_stack,int _n);
void oc_enc_pred_dc_frag_rows(oc_enc_ctx *_enc,
int _pli,int _fragy0,int _frag_yend);
void oc_enc_tokenize_dc_frag_list(oc_enc_ctx *_enc,int _pli,
const ptrdiff_t *_coded_fragis,ptrdiff_t _ncoded_fragis,
int _prev_ndct_tokens1,int _prev_eob_run1);
void oc_enc_tokenize_finish(oc_enc_ctx *_enc);
/*Utility routine to encode one of the header packets.*/
int oc_state_flushheader(oc_theora_state *_state,int *_packet_state,
oggpack_buffer *_opb,const th_quant_info *_qinfo,
const th_huff_code _codes[TH_NHUFFMAN_TABLES][TH_NDCT_TOKENS],
const char *_vendor,th_comment *_tc,ogg_packet *_op);
/*Encoder-specific accelerated functions.*/
void oc_enc_frag_sub(const oc_enc_ctx *_enc,ogg_int16_t _diff[64],
const unsigned char *_src,const unsigned char *_ref,int _ystride);
void oc_enc_frag_sub_128(const oc_enc_ctx *_enc,ogg_int16_t _diff[64],
const unsigned char *_src,int _ystride);
unsigned oc_enc_frag_sad(const oc_enc_ctx *_enc,const unsigned char *_src,
const unsigned char *_ref,int _ystride);
unsigned oc_enc_frag_sad_thresh(const oc_enc_ctx *_enc,
const unsigned char *_src,const unsigned char *_ref,int _ystride,
unsigned _thresh);
unsigned oc_enc_frag_sad2_thresh(const oc_enc_ctx *_enc,
const unsigned char *_src,const unsigned char *_ref1,
const unsigned char *_ref2,int _ystride,unsigned _thresh);
unsigned oc_enc_frag_satd_thresh(const oc_enc_ctx *_enc,
const unsigned char *_src,const unsigned char *_ref,int _ystride,
unsigned _thresh);
unsigned oc_enc_frag_satd2_thresh(const oc_enc_ctx *_enc,
const unsigned char *_src,const unsigned char *_ref1,
const unsigned char *_ref2,int _ystride,unsigned _thresh);
unsigned oc_enc_frag_intra_satd(const oc_enc_ctx *_enc,
const unsigned char *_src,int _ystride);
void oc_enc_frag_copy2(const oc_enc_ctx *_enc,unsigned char *_dst,
const unsigned char *_src1,const unsigned char *_src2,int _ystride);
void oc_enc_frag_recon_intra(const oc_enc_ctx *_enc,
unsigned char *_dst,int _ystride,const ogg_int16_t _residue[64]);
void oc_enc_frag_recon_inter(const oc_enc_ctx *_enc,unsigned char *_dst,
const unsigned char *_src,int _ystride,const ogg_int16_t _residue[64]);
void oc_enc_fdct8x8(const oc_enc_ctx *_enc,ogg_int16_t _y[64],
const ogg_int16_t _x[64]);
/*Default pure-C implementations.*/
void oc_enc_vtable_init_c(oc_enc_ctx *_enc);
void oc_enc_frag_sub_c(ogg_int16_t _diff[64],
const unsigned char *_src,const unsigned char *_ref,int _ystride);
void oc_enc_frag_sub_128_c(ogg_int16_t _diff[64],
const unsigned char *_src,int _ystride);
void oc_enc_frag_copy2_c(unsigned char *_dst,
const unsigned char *_src1,const unsigned char *_src2,int _ystride);
unsigned oc_enc_frag_sad_c(const unsigned char *_src,
const unsigned char *_ref,int _ystride);
unsigned oc_enc_frag_sad_thresh_c(const unsigned char *_src,
const unsigned char *_ref,int _ystride,unsigned _thresh);
unsigned oc_enc_frag_sad2_thresh_c(const unsigned char *_src,
const unsigned char *_ref1,const unsigned char *_ref2,int _ystride,
unsigned _thresh);
unsigned oc_enc_frag_satd_thresh_c(const unsigned char *_src,
const unsigned char *_ref,int _ystride,unsigned _thresh);
unsigned oc_enc_frag_satd2_thresh_c(const unsigned char *_src,
const unsigned char *_ref1,const unsigned char *_ref2,int _ystride,
unsigned _thresh);
unsigned oc_enc_frag_intra_satd_c(const unsigned char *_src,int _ystride);
void oc_enc_fdct8x8_c(ogg_int16_t _y[64],const ogg_int16_t _x[64]);
#endif

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,274 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: enquant.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include "encint.h"
void oc_quant_params_pack(oggpack_buffer *_opb,const th_quant_info *_qinfo){
const th_quant_ranges *qranges;
const th_quant_base *base_mats[2*3*64];
int indices[2][3][64];
int nbase_mats;
int nbits;
int ci;
int qi;
int qri;
int qti;
int pli;
int qtj;
int plj;
int bmi;
int i;
i=_qinfo->loop_filter_limits[0];
for(qi=1;qi<64;qi++)i=OC_MAXI(i,_qinfo->loop_filter_limits[qi]);
nbits=OC_ILOG_32(i);
oggpackB_write(_opb,nbits,3);
for(qi=0;qi<64;qi++){
oggpackB_write(_opb,_qinfo->loop_filter_limits[qi],nbits);
}
/*580 bits for VP3.*/
i=1;
for(qi=0;qi<64;qi++)i=OC_MAXI(_qinfo->ac_scale[qi],i);
nbits=OC_ILOGNZ_32(i);
oggpackB_write(_opb,nbits-1,4);
for(qi=0;qi<64;qi++)oggpackB_write(_opb,_qinfo->ac_scale[qi],nbits);
/*516 bits for VP3.*/
i=1;
for(qi=0;qi<64;qi++)i=OC_MAXI(_qinfo->dc_scale[qi],i);
nbits=OC_ILOGNZ_32(i);
oggpackB_write(_opb,nbits-1,4);
for(qi=0;qi<64;qi++)oggpackB_write(_opb,_qinfo->dc_scale[qi],nbits);
/*Consolidate any duplicate base matrices.*/
nbase_mats=0;
for(qti=0;qti<2;qti++)for(pli=0;pli<3;pli++){
qranges=_qinfo->qi_ranges[qti]+pli;
for(qri=0;qri<=qranges->nranges;qri++){
for(bmi=0;;bmi++){
if(bmi>=nbase_mats){
base_mats[bmi]=qranges->base_matrices+qri;
indices[qti][pli][qri]=nbase_mats++;
break;
}
else if(memcmp(base_mats[bmi][0],qranges->base_matrices[qri],
sizeof(base_mats[bmi][0]))==0){
indices[qti][pli][qri]=bmi;
break;
}
}
}
}
/*Write out the list of unique base matrices.
1545 bits for VP3 matrices.*/
oggpackB_write(_opb,nbase_mats-1,9);
for(bmi=0;bmi<nbase_mats;bmi++){
for(ci=0;ci<64;ci++)oggpackB_write(_opb,base_mats[bmi][0][ci],8);
}
/*Now store quant ranges and their associated indices into the base matrix
list.
46 bits for VP3 matrices.*/
nbits=OC_ILOG_32(nbase_mats-1);
for(i=0;i<6;i++){
qti=i/3;
pli=i%3;
qranges=_qinfo->qi_ranges[qti]+pli;
if(i>0){
if(qti>0){
if(qranges->nranges==_qinfo->qi_ranges[qti-1][pli].nranges&&
memcmp(qranges->sizes,_qinfo->qi_ranges[qti-1][pli].sizes,
qranges->nranges*sizeof(qranges->sizes[0]))==0&&
memcmp(indices[qti][pli],indices[qti-1][pli],
(qranges->nranges+1)*sizeof(indices[qti][pli][0]))==0){
oggpackB_write(_opb,1,2);
continue;
}
}
qtj=(i-1)/3;
plj=(i-1)%3;
if(qranges->nranges==_qinfo->qi_ranges[qtj][plj].nranges&&
memcmp(qranges->sizes,_qinfo->qi_ranges[qtj][plj].sizes,
qranges->nranges*sizeof(qranges->sizes[0]))==0&&
memcmp(indices[qti][pli],indices[qtj][plj],
(qranges->nranges+1)*sizeof(indices[qti][pli][0]))==0){
oggpackB_write(_opb,0,1+(qti>0));
continue;
}
oggpackB_write(_opb,1,1);
}
oggpackB_write(_opb,indices[qti][pli][0],nbits);
for(qi=qri=0;qi<63;qri++){
oggpackB_write(_opb,qranges->sizes[qri]-1,OC_ILOG_32(62-qi));
qi+=qranges->sizes[qri];
oggpackB_write(_opb,indices[qti][pli][qri+1],nbits);
}
}
}
static void oc_iquant_init(oc_iquant *_this,ogg_uint16_t _d){
ogg_uint32_t t;
int l;
_d<<=1;
l=OC_ILOGNZ_32(_d)-1;
t=1+((ogg_uint32_t)1<<16+l)/_d;
_this->m=(ogg_int16_t)(t-0x10000);
_this->l=l;
}
/*See comments at oc_dequant_tables_init() for how the quantization tables'
storage should be initialized.*/
void oc_enquant_tables_init(ogg_uint16_t *_dequant[64][3][2],
oc_iquant *_enquant[64][3][2],const th_quant_info *_qinfo){
int qi;
int pli;
int qti;
/*Initialize the dequantization tables first.*/
oc_dequant_tables_init(_dequant,NULL,_qinfo);
/*Derive the quantization tables directly from the dequantization tables.*/
for(qi=0;qi<64;qi++)for(qti=0;qti<2;qti++)for(pli=0;pli<3;pli++){
int zzi;
int plj;
int qtj;
int dupe;
dupe=0;
for(qtj=0;qtj<=qti;qtj++){
for(plj=0;plj<(qtj<qti?3:pli);plj++){
if(_dequant[qi][pli][qti]==_dequant[qi][plj][qtj]){
dupe=1;
break;
}
}
if(dupe)break;
}
if(dupe){
_enquant[qi][pli][qti]=_enquant[qi][plj][qtj];
continue;
}
/*In the original VP3.2 code, the rounding offset and the size of the
dead zone around 0 were controlled by a "sharpness" parameter.
We now R-D optimize the tokens for each block after quantization,
so the rounding offset should always be 1/2, and an explicit dead
zone is unnecessary.
Hence, all of that VP3.2 code is gone from here, and the remaining
floating point code has been implemented as equivalent integer
code with exact precision.*/
for(zzi=0;zzi<64;zzi++){
oc_iquant_init(_enquant[qi][pli][qti]+zzi,
_dequant[qi][pli][qti][zzi]);
}
}
}
/*This table gives the square root of the fraction of the squared magnitude of
each DCT coefficient relative to the total, scaled by 2**16, for both INTRA
and INTER modes.
These values were measured after motion-compensated prediction, before
quantization, over a large set of test video (from QCIF to 1080p) encoded at
all possible rates.
The DC coefficient takes into account the DPCM prediction (using the
quantized values from neighboring blocks, as the encoder does, but still
before quantization of the coefficient in the current block).
The results differ significantly from the expected variance (e.g., using an
AR(1) model of the signal with rho=0.95, as is frequently done to compute
the coding gain of the DCT).
We use them to estimate an "average" quantizer for a given quantizer matrix,
as this is used to parameterize a number of the rate control decisions.
These values are themselves probably quantizer-matrix dependent, since the
shape of the matrix affects the noise distribution in the reference frames,
but they should at least give us _some_ amount of adaptivity to different
matrices, as opposed to hard-coding a table of average Q values for the
current set.
The main features they capture are that a) only a few of the quantizers in
the upper-left corner contribute anything significant at all (though INTER
mode is significantly flatter) and b) the DPCM prediction of the DC
coefficient gives a very minor improvement in the INTRA case and a quite
significant one in the INTER case (over the expected variance).*/
static const ogg_uint16_t OC_RPSD[2][64]={
{
52725,17370,10399, 6867, 5115, 3798, 2942, 2076,
17370, 9900, 6948, 4994, 3836, 2869, 2229, 1619,
10399, 6948, 5516, 4202, 3376, 2573, 2015, 1461,
6867, 4994, 4202, 3377, 2800, 2164, 1718, 1243,
5115, 3836, 3376, 2800, 2391, 1884, 1530, 1091,
3798, 2869, 2573, 2164, 1884, 1495, 1212, 873,
2942, 2229, 2015, 1718, 1530, 1212, 1001, 704,
2076, 1619, 1461, 1243, 1091, 873, 704, 474
},
{
23411,15604,13529,11601,10683, 8958, 7840, 6142,
15604,11901,10718, 9108, 8290, 6961, 6023, 4487,
13529,10718, 9961, 8527, 7945, 6689, 5742, 4333,
11601, 9108, 8527, 7414, 7084, 5923, 5175, 3743,
10683, 8290, 7945, 7084, 6771, 5754, 4793, 3504,
8958, 6961, 6689, 5923, 5754, 4679, 3936, 2989,
7840, 6023, 5742, 5175, 4793, 3936, 3522, 2558,
6142, 4487, 4333, 3743, 3504, 2989, 2558, 1829
}
};
/*The fraction of the squared magnitude of the residuals in each color channel
relative to the total, scaled by 2**16, for each pixel format.
These values were measured after motion-compensated prediction, before
quantization, over a large set of test video encoded at all possible rates.
TODO: These values are only from INTER frames; it should be re-measured for
INTRA frames.*/
static const ogg_uint16_t OC_PCD[4][3]={
{59926, 3038, 2572},
{55201, 5597, 4738},
{55201, 5597, 4738},
{47682, 9669, 8185}
};
/*Compute an "average" quantizer for each qi level.
We do one for INTER and one for INTRA, since their behavior is very
different, but average across chroma channels.
The basic approach is to compute a harmonic average of the squared quantizer,
weighted by the expected squared magnitude of the DCT coefficients.
Under the (not quite true) assumption that DCT coefficients are
Laplacian-distributed, this preserves the product Q*lambda, where
lambda=sqrt(2/sigma**2) is the Laplacian distribution parameter (not to be
confused with the lambda used in R-D optimization throughout most of the
rest of the code).
The value Q*lambda completely determines the entropy of the coefficients.*/
void oc_enquant_qavg_init(ogg_int64_t _log_qavg[2][64],
ogg_uint16_t *_dequant[64][3][2],int _pixel_fmt){
int qi;
int pli;
int qti;
int ci;
for(qti=0;qti<2;qti++)for(qi=0;qi<64;qi++){
ogg_int64_t q2;
q2=0;
for(pli=0;pli<3;pli++){
ogg_uint32_t qp;
qp=0;
for(ci=0;ci<64;ci++){
unsigned rq;
unsigned qd;
qd=_dequant[qi][pli][qti][OC_IZIG_ZAG[ci]];
rq=(OC_RPSD[qti][ci]+(qd>>1))/qd;
qp+=rq*(ogg_uint32_t)rq;
}
q2+=OC_PCD[_pixel_fmt][pli]*(ogg_int64_t)qp;
}
/*qavg=1.0/sqrt(q2).*/
_log_qavg[qti][qi]=OC_Q57(48)-oc_blog64(q2)>>1;
}
}

View file

@ -0,0 +1,27 @@
#if !defined(_enquant_H)
# define _enquant_H (1)
# include "quant.h"
typedef struct oc_iquant oc_iquant;
#define OC_QUANT_MAX_LOG (OC_Q57(OC_STATIC_ILOG_32(OC_QUANT_MAX)-1))
/*Used to compute x/d via ((x*m>>16)+x>>l)+(x<0))
(i.e., one 16x16->16 mul, 2 shifts, and 2 adds).
This is not an approximation; for 16-bit x and d, it is exact.*/
struct oc_iquant{
ogg_int16_t m;
ogg_int16_t l;
};
typedef oc_iquant oc_iquant_table[64];
void oc_quant_params_pack(oggpack_buffer *_opb,const th_quant_info *_qinfo);
void oc_enquant_tables_init(ogg_uint16_t *_dequant[64][3][2],
oc_iquant *_enquant[64][3][2],const th_quant_info *_qinfo);
void oc_enquant_qavg_init(ogg_int64_t _log_qavg[2][64],
ogg_uint16_t *_dequant[64][3][2],int _pixel_fmt);
#endif

View file

@ -0,0 +1,422 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: fdct.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
#include "encint.h"
#include "dct.h"
/*Performs a forward 8 point Type-II DCT transform.
The output is scaled by a factor of 2 from the orthonormal version of the
transform.
_y: The buffer to store the result in.
Data will be placed the first 8 entries (e.g., in a row of an 8x8 block).
_x: The input coefficients.
Every 8th entry is used (e.g., from a column of an 8x8 block).*/
static void oc_fdct8(ogg_int16_t _y[8],const ogg_int16_t *_x){
int t0;
int t1;
int t2;
int t3;
int t4;
int t5;
int t6;
int t7;
int r;
int s;
int u;
int v;
/*Stage 1:*/
/*0-7 butterfly.*/
t0=_x[0<<3]+(int)_x[7<<3];
t7=_x[0<<3]-(int)_x[7<<3];
/*1-6 butterfly.*/
t1=_x[1<<3]+(int)_x[6<<3];
t6=_x[1<<3]-(int)_x[6<<3];
/*2-5 butterfly.*/
t2=_x[2<<3]+(int)_x[5<<3];
t5=_x[2<<3]-(int)_x[5<<3];
/*3-4 butterfly.*/
t3=_x[3<<3]+(int)_x[4<<3];
t4=_x[3<<3]-(int)_x[4<<3];
/*Stage 2:*/
/*0-3 butterfly.*/
r=t0+t3;
t3=t0-t3;
t0=r;
/*1-2 butterfly.*/
r=t1+t2;
t2=t1-t2;
t1=r;
/*6-5 butterfly.*/
r=t6+t5;
t5=t6-t5;
t6=r;
/*Stages 3 and 4 are where all the approximation occurs.
These are chosen to be as close to an exact inverse of the approximations
made in the iDCT as possible, while still using mostly 16-bit arithmetic.
We use some 16x16->32 signed MACs, but those still commonly execute in 1
cycle on a 16-bit DSP.
For example, s=(27146*t5+0x4000>>16)+t5+(t5!=0) is an exact inverse of
t5=(OC_C4S4*s>>16).
That is, applying the latter to the output of the former will recover t5
exactly (over the valid input range of t5, -23171...23169).
We increase the rounding bias to 0xB500 in this particular case so that
errors inverting the subsequent butterfly are not one-sided (e.g., the
mean error is very close to zero).
The (t5!=0) term could be replaced simply by 1, but we want to send 0 to 0.
The fDCT of an all-zeros block will still not be zero, because of the
biases we added at the very beginning of the process, but it will be close
enough that it is guaranteed to round to zero.*/
/*Stage 3:*/
/*4-5 butterfly.*/
s=(27146*t5+0xB500>>16)+t5+(t5!=0)>>1;
r=t4+s;
t5=t4-s;
t4=r;
/*7-6 butterfly.*/
s=(27146*t6+0xB500>>16)+t6+(t6!=0)>>1;
r=t7+s;
t6=t7-s;
t7=r;
/*Stage 4:*/
/*0-1 butterfly.*/
r=(27146*t0+0x4000>>16)+t0+(t0!=0);
s=(27146*t1+0xB500>>16)+t1+(t1!=0);
u=r+s>>1;
v=r-u;
_y[0]=u;
_y[4]=v;
/*3-2 rotation by 6pi/16*/
u=(OC_C6S2*t2+OC_C2S6*t3+0x6CB7>>16)+(t3!=0);
s=(OC_C6S2*u>>16)-t2;
v=(s*21600+0x2800>>18)+s+(s!=0);
_y[2]=u;
_y[6]=v;
/*6-5 rotation by 3pi/16*/
u=(OC_C5S3*t6+OC_C3S5*t5+0x0E3D>>16)+(t5!=0);
s=t6-(OC_C5S3*u>>16);
v=(s*26568+0x3400>>17)+s+(s!=0);
_y[5]=u;
_y[3]=v;
/*7-4 rotation by 7pi/16*/
u=(OC_C7S1*t4+OC_C1S7*t7+0x7B1B>>16)+(t7!=0);
s=(OC_C7S1*u>>16)-t4;
v=(s*20539+0x3000>>20)+s+(s!=0);
_y[1]=u;
_y[7]=v;
}
void oc_enc_fdct8x8(const oc_enc_ctx *_enc,ogg_int16_t _y[64],
const ogg_int16_t _x[64]){
(*_enc->opt_vtable.fdct8x8)(_y,_x);
}
/*Performs a forward 8x8 Type-II DCT transform.
The output is scaled by a factor of 4 relative to the orthonormal version
of the transform.
_y: The buffer to store the result in.
This may be the same as _x.
_x: The input coefficients. */
void oc_enc_fdct8x8_c(ogg_int16_t _y[64],const ogg_int16_t _x[64]){
const ogg_int16_t *in;
ogg_int16_t *end;
ogg_int16_t *out;
ogg_int16_t w[64];
int i;
/*Add two extra bits of working precision to improve accuracy; any more and
we could overflow.*/
for(i=0;i<64;i++)w[i]=_x[i]<<2;
/*These biases correct for some systematic error that remains in the full
fDCT->iDCT round trip.*/
w[0]+=(w[0]!=0)+1;
w[1]++;
w[8]--;
/*Transform columns of w into rows of _y.*/
for(in=w,out=_y,end=out+64;out<end;in++,out+=8)oc_fdct8(out,in);
/*Transform columns of _y into rows of w.*/
for(in=_y,out=w,end=out+64;out<end;in++,out+=8)oc_fdct8(out,in);
/*Round the result back to the external working precision (which is still
scaled by four relative to the orthogonal result).
TODO: We should just update the external working precision.*/
for(i=0;i<64;i++)_y[i]=w[i]+2>>2;
}
/*This does not seem to outperform simple LFE border padding before MC.
It yields higher PSNR, but much higher bitrate usage.*/
#if 0
typedef struct oc_extension_info oc_extension_info;
/*Information needed to pad boundary blocks.
We multiply each row/column by an extension matrix that fills in the padding
values as a linear combination of the active values, so that an equivalent
number of coefficients are forced to zero.
This costs at most 16 multiplies, the same as a 1-D fDCT itself, and as
little as 7 multiplies.
We compute the extension matrices for every possible shape in advance, as
there are only 35.
The coefficients for all matrices are stored in a single array to take
advantage of the overlap and repetitiveness of many of the shapes.
A similar technique is applied to the offsets into this array.
This reduces the required table storage by about 48%.
See tools/extgen.c for details.
We could conceivably do the same for all 256 possible shapes.*/
struct oc_extension_info{
/*The mask of the active pixels in the shape.*/
short mask;
/*The number of active pixels in the shape.*/
short na;
/*The extension matrix.
This is (8-na)xna*/
const ogg_int16_t *const *ext;
/*The pixel indices: na active pixels followed by 8-na padding pixels.*/
unsigned char pi[8];
/*The coefficient indices: na unconstrained coefficients followed by 8-na
coefficients to be forced to zero.*/
unsigned char ci[8];
};
/*The number of shapes we need.*/
#define OC_NSHAPES (35)
static const ogg_int16_t OC_EXT_COEFFS[229]={
0x7FFF,0xE1F8,0x6903,0xAA79,0x5587,0x7FFF,0x1E08,0x7FFF,
0x5587,0xAA79,0x6903,0xE1F8,0x7FFF,0x0000,0x0000,0x0000,
0x7FFF,0x0000,0x0000,0x7FFF,0x8000,0x7FFF,0x0000,0x0000,
0x7FFF,0xE1F8,0x1E08,0xB0A7,0xAA1D,0x337C,0x7FFF,0x4345,
0x2267,0x4345,0x7FFF,0x337C,0xAA1D,0xB0A7,0x8A8C,0x4F59,
0x03B4,0xE2D6,0x7FFF,0x2CF3,0x7FFF,0xE2D6,0x03B4,0x4F59,
0x8A8C,0x1103,0x7AEF,0x5225,0xDF60,0xC288,0xDF60,0x5225,
0x7AEF,0x1103,0x668A,0xD6EE,0x3A16,0x0E6C,0xFA07,0x0E6C,
0x3A16,0xD6EE,0x668A,0x2A79,0x2402,0x980F,0x50F5,0x4882,
0x50F5,0x980F,0x2402,0x2A79,0xF976,0x2768,0x5F22,0x2768,
0xF976,0x1F91,0x76C1,0xE9AE,0x76C1,0x1F91,0x7FFF,0xD185,
0x0FC8,0xD185,0x7FFF,0x4F59,0x4345,0xED62,0x4345,0x4F59,
0xF574,0x5D99,0x2CF3,0x5D99,0xF574,0x5587,0x3505,0x30FC,
0xF482,0x953C,0xEAC4,0x7FFF,0x4F04,0x7FFF,0xEAC4,0x953C,
0xF482,0x30FC,0x4F04,0x273D,0xD8C3,0x273D,0x1E09,0x61F7,
0x1E09,0x273D,0xD8C3,0x273D,0x4F04,0x30FC,0xA57E,0x153C,
0x6AC4,0x3C7A,0x1E08,0x3C7A,0x6AC4,0x153C,0xA57E,0x7FFF,
0xA57E,0x5A82,0x6AC4,0x153C,0xC386,0xE1F8,0xC386,0x153C,
0x6AC4,0x5A82,0xD8C3,0x273D,0x7FFF,0xE1F7,0x7FFF,0x273D,
0xD8C3,0x4F04,0x30FC,0xD8C3,0x273D,0xD8C3,0x30FC,0x4F04,
0x1FC8,0x67AD,0x1853,0xE038,0x1853,0x67AD,0x1FC8,0x4546,
0xE038,0x1FC8,0x3ABA,0x1FC8,0xE038,0x4546,0x3505,0x5587,
0xF574,0xBC11,0x78F4,0x4AFB,0xE6F3,0x4E12,0x3C11,0xF8F4,
0x4AFB,0x3C7A,0xF88B,0x3C11,0x78F4,0xCAFB,0x7FFF,0x08CC,
0x070C,0x236D,0x5587,0x236D,0x070C,0xF88B,0x3C7A,0x4AFB,
0xF8F4,0x3C11,0x7FFF,0x153C,0xCAFB,0x153C,0x7FFF,0x1E08,
0xE1F8,0x7FFF,0x08CC,0x7FFF,0xCAFB,0x78F4,0x3C11,0x4E12,
0xE6F3,0x4AFB,0x78F4,0xBC11,0xFE3D,0x7FFF,0xFE3D,0x2F3A,
0x7FFF,0x2F3A,0x89BC,0x7FFF,0x89BC
};
static const ogg_int16_t *const OC_EXT_ROWS[96]={
OC_EXT_COEFFS+ 0,OC_EXT_COEFFS+ 0,OC_EXT_COEFFS+ 0,OC_EXT_COEFFS+ 0,
OC_EXT_COEFFS+ 0,OC_EXT_COEFFS+ 0,OC_EXT_COEFFS+ 0,OC_EXT_COEFFS+ 6,
OC_EXT_COEFFS+ 27,OC_EXT_COEFFS+ 38,OC_EXT_COEFFS+ 43,OC_EXT_COEFFS+ 32,
OC_EXT_COEFFS+ 49,OC_EXT_COEFFS+ 58,OC_EXT_COEFFS+ 67,OC_EXT_COEFFS+ 71,
OC_EXT_COEFFS+ 62,OC_EXT_COEFFS+ 53,OC_EXT_COEFFS+ 12,OC_EXT_COEFFS+ 15,
OC_EXT_COEFFS+ 14,OC_EXT_COEFFS+ 13,OC_EXT_COEFFS+ 76,OC_EXT_COEFFS+ 81,
OC_EXT_COEFFS+ 86,OC_EXT_COEFFS+ 91,OC_EXT_COEFFS+ 96,OC_EXT_COEFFS+ 98,
OC_EXT_COEFFS+ 93,OC_EXT_COEFFS+ 88,OC_EXT_COEFFS+ 83,OC_EXT_COEFFS+ 78,
OC_EXT_COEFFS+ 12,OC_EXT_COEFFS+ 15,OC_EXT_COEFFS+ 15,OC_EXT_COEFFS+ 12,
OC_EXT_COEFFS+ 12,OC_EXT_COEFFS+ 15,OC_EXT_COEFFS+ 12,OC_EXT_COEFFS+ 15,
OC_EXT_COEFFS+ 15,OC_EXT_COEFFS+ 12,OC_EXT_COEFFS+ 103,OC_EXT_COEFFS+ 108,
OC_EXT_COEFFS+ 126,OC_EXT_COEFFS+ 16,OC_EXT_COEFFS+ 137,OC_EXT_COEFFS+ 141,
OC_EXT_COEFFS+ 20,OC_EXT_COEFFS+ 130,OC_EXT_COEFFS+ 113,OC_EXT_COEFFS+ 116,
OC_EXT_COEFFS+ 146,OC_EXT_COEFFS+ 153,OC_EXT_COEFFS+ 160,OC_EXT_COEFFS+ 167,
OC_EXT_COEFFS+ 170,OC_EXT_COEFFS+ 163,OC_EXT_COEFFS+ 156,OC_EXT_COEFFS+ 149,
OC_EXT_COEFFS+ 119,OC_EXT_COEFFS+ 122,OC_EXT_COEFFS+ 174,OC_EXT_COEFFS+ 177,
OC_EXT_COEFFS+ 182,OC_EXT_COEFFS+ 187,OC_EXT_COEFFS+ 192,OC_EXT_COEFFS+ 197,
OC_EXT_COEFFS+ 202,OC_EXT_COEFFS+ 207,OC_EXT_COEFFS+ 210,OC_EXT_COEFFS+ 215,
OC_EXT_COEFFS+ 179,OC_EXT_COEFFS+ 189,OC_EXT_COEFFS+ 24,OC_EXT_COEFFS+ 204,
OC_EXT_COEFFS+ 184,OC_EXT_COEFFS+ 194,OC_EXT_COEFFS+ 212,OC_EXT_COEFFS+ 199,
OC_EXT_COEFFS+ 217,OC_EXT_COEFFS+ 100,OC_EXT_COEFFS+ 134,OC_EXT_COEFFS+ 135,
OC_EXT_COEFFS+ 135,OC_EXT_COEFFS+ 12,OC_EXT_COEFFS+ 15,OC_EXT_COEFFS+ 134,
OC_EXT_COEFFS+ 134,OC_EXT_COEFFS+ 135,OC_EXT_COEFFS+ 220,OC_EXT_COEFFS+ 223,
OC_EXT_COEFFS+ 226,OC_EXT_COEFFS+ 227,OC_EXT_COEFFS+ 224,OC_EXT_COEFFS+ 221
};
static const oc_extension_info OC_EXTENSION_INFO[OC_NSHAPES]={
{0x7F,7,OC_EXT_ROWS+ 0,{0,1,2,3,4,5,6,7},{0,1,2,4,5,6,7,3}},
{0xFE,7,OC_EXT_ROWS+ 7,{1,2,3,4,5,6,7,0},{0,1,2,4,5,6,7,3}},
{0x3F,6,OC_EXT_ROWS+ 8,{0,1,2,3,4,5,7,6},{0,1,3,4,6,7,5,2}},
{0xFC,6,OC_EXT_ROWS+ 10,{2,3,4,5,6,7,1,0},{0,1,3,4,6,7,5,2}},
{0x1F,5,OC_EXT_ROWS+ 12,{0,1,2,3,4,7,6,5},{0,2,3,5,7,6,4,1}},
{0xF8,5,OC_EXT_ROWS+ 15,{3,4,5,6,7,2,1,0},{0,2,3,5,7,6,4,1}},
{0x0F,4,OC_EXT_ROWS+ 18,{0,1,2,3,7,6,5,4},{0,2,4,6,7,5,3,1}},
{0xF0,4,OC_EXT_ROWS+ 18,{4,5,6,7,3,2,1,0},{0,2,4,6,7,5,3,1}},
{0x07,3,OC_EXT_ROWS+ 22,{0,1,2,7,6,5,4,3},{0,3,6,7,5,4,2,1}},
{0xE0,3,OC_EXT_ROWS+ 27,{5,6,7,4,3,2,1,0},{0,3,6,7,5,4,2,1}},
{0x03,2,OC_EXT_ROWS+ 32,{0,1,7,6,5,4,3,2},{0,4,7,6,5,3,2,1}},
{0xC0,2,OC_EXT_ROWS+ 32,{6,7,5,4,3,2,1,0},{0,4,7,6,5,3,2,1}},
{0x01,1,OC_EXT_ROWS+ 0,{0,7,6,5,4,3,2,1},{0,7,6,5,4,3,2,1}},
{0x80,1,OC_EXT_ROWS+ 0,{7,6,5,4,3,2,1,0},{0,7,6,5,4,3,2,1}},
{0x7E,6,OC_EXT_ROWS+ 42,{1,2,3,4,5,6,7,0},{0,1,2,5,6,7,4,3}},
{0x7C,5,OC_EXT_ROWS+ 44,{2,3,4,5,6,7,1,0},{0,1,4,5,7,6,3,2}},
{0x3E,5,OC_EXT_ROWS+ 47,{1,2,3,4,5,7,6,0},{0,1,4,5,7,6,3,2}},
{0x78,4,OC_EXT_ROWS+ 50,{3,4,5,6,7,2,1,0},{0,4,5,7,6,3,2,1}},
{0x3C,4,OC_EXT_ROWS+ 54,{2,3,4,5,7,6,1,0},{0,3,4,7,6,5,2,1}},
{0x1E,4,OC_EXT_ROWS+ 58,{1,2,3,4,7,6,5,0},{0,4,5,7,6,3,2,1}},
{0x70,3,OC_EXT_ROWS+ 62,{4,5,6,7,3,2,1,0},{0,5,7,6,4,3,2,1}},
{0x38,3,OC_EXT_ROWS+ 67,{3,4,5,7,6,2,1,0},{0,5,6,7,4,3,2,1}},
{0x1C,3,OC_EXT_ROWS+ 72,{2,3,4,7,6,5,1,0},{0,5,6,7,4,3,2,1}},
{0x0E,3,OC_EXT_ROWS+ 77,{1,2,3,7,6,5,4,0},{0,5,7,6,4,3,2,1}},
{0x60,2,OC_EXT_ROWS+ 82,{5,6,7,4,3,2,1,0},{0,2,7,6,5,4,3,1}},
{0x30,2,OC_EXT_ROWS+ 36,{4,5,7,6,3,2,1,0},{0,4,7,6,5,3,2,1}},
{0x18,2,OC_EXT_ROWS+ 90,{3,4,7,6,5,2,1,0},{0,1,7,6,5,4,3,2}},
{0x0C,2,OC_EXT_ROWS+ 34,{2,3,7,6,5,4,1,0},{0,4,7,6,5,3,2,1}},
{0x06,2,OC_EXT_ROWS+ 84,{1,2,7,6,5,4,3,0},{0,2,7,6,5,4,3,1}},
{0x40,1,OC_EXT_ROWS+ 0,{6,7,5,4,3,2,1,0},{0,7,6,5,4,3,2,1}},
{0x20,1,OC_EXT_ROWS+ 0,{5,7,6,4,3,2,1,0},{0,7,6,5,4,3,2,1}},
{0x10,1,OC_EXT_ROWS+ 0,{4,7,6,5,3,2,1,0},{0,7,6,5,4,3,2,1}},
{0x08,1,OC_EXT_ROWS+ 0,{3,7,6,5,4,2,1,0},{0,7,6,5,4,3,2,1}},
{0x04,1,OC_EXT_ROWS+ 0,{2,7,6,5,4,3,1,0},{0,7,6,5,4,3,2,1}},
{0x02,1,OC_EXT_ROWS+ 0,{1,7,6,5,4,3,2,0},{0,7,6,5,4,3,2,1}}
};
/*Pads a single column of a partial block and then performs a forward Type-II
DCT on the result.
The input is scaled by a factor of 4 and biased appropriately for the current
fDCT implementation.
The output is scaled by an additional factor of 2 from the orthonormal
version of the transform.
_y: The buffer to store the result in.
Data will be placed the first 8 entries (e.g., in a row of an 8x8 block).
_x: The input coefficients.
Every 8th entry is used (e.g., from a column of an 8x8 block).
_e: The extension information for the shape.*/
static void oc_fdct8_ext(ogg_int16_t _y[8],ogg_int16_t *_x,
const oc_extension_info *_e){
const unsigned char *pi;
int na;
na=_e->na;
pi=_e->pi;
if(na==1){
int ci;
/*While the branch below is still correct for shapes with na==1, we can
perform the entire transform with just 1 multiply in this case instead
of 23.*/
_y[0]=(ogg_int16_t)(OC_DIV2_16(OC_C4S4*(_x[pi[0]])));
for(ci=1;ci<8;ci++)_y[ci]=0;
}
else{
const ogg_int16_t *const *ext;
int zpi;
int api;
int nz;
/*First multiply by the extension matrix to compute the padding values.*/
nz=8-na;
ext=_e->ext;
for(zpi=0;zpi<nz;zpi++){
ogg_int32_t v;
v=0;
for(api=0;api<na;api++){
v+=ext[zpi][api]*(ogg_int32_t)(_x[pi[api]<<3]<<1);
}
_x[pi[na+zpi]<<3]=(ogg_int16_t)(v+0x8000>>16)+1>>1;
}
oc_fdct8(_y,_x);
}
}
/*Performs a forward 8x8 Type-II DCT transform on blocks which overlap the
border of the picture region.
This method ONLY works with rectangular regions.
_border: A description of which pixels are inside the border.
_y: The buffer to store the result in.
This may be the same as _x.
_x: The input pixel values.
Pixel values outside the border will be ignored.*/
void oc_fdct8x8_border(const oc_border_info *_border,
ogg_int16_t _y[64],const ogg_int16_t _x[64]){
ogg_int16_t *in;
ogg_int16_t *out;
ogg_int16_t w[64];
ogg_int64_t mask;
const oc_extension_info *cext;
const oc_extension_info *rext;
int cmask;
int rmask;
int ri;
int ci;
/*Identify the shapes of the non-zero rows and columns.*/
rmask=cmask=0;
mask=_border->mask;
for(ri=0;ri<8;ri++){
/*This aggregation is _only_ correct for rectangular masks.*/
cmask|=((mask&0xFF)!=0)<<ri;
rmask|=mask&0xFF;
mask>>=8;
}
/*Find the associated extension info for these shapes.*/
if(cmask==0xFF)cext=NULL;
else for(cext=OC_EXTENSION_INFO;cext->mask!=cmask;){
/*If we somehow can't find the shape, then just do an unpadded fDCT.
It won't be efficient, but it should still be correct.*/
if(++cext>=OC_EXTENSION_INFO+OC_NSHAPES){
oc_enc_fdct8x8_c(_y,_x);
return;
}
}
if(rmask==0xFF)rext=NULL;
else for(rext=OC_EXTENSION_INFO;rext->mask!=rmask;){
/*If we somehow can't find the shape, then just do an unpadded fDCT.
It won't be efficient, but it should still be correct.*/
if(++rext>=OC_EXTENSION_INFO+OC_NSHAPES){
oc_enc_fdct8x8_c(_y,_x);
return;
}
}
/*Add two extra bits of working precision to improve accuracy; any more and
we could overflow.*/
for(ci=0;ci<64;ci++)w[ci]=_x[ci]<<2;
/*These biases correct for some systematic error that remains in the full
fDCT->iDCT round trip.
We can safely add them before padding, since if these pixel values are
overwritten, we didn't care what they were anyway (and the unbiased values
will usually yield smaller DCT coefficient magnitudes).*/
w[0]+=(w[0]!=0)+1;
w[1]++;
w[8]--;
/*Transform the columns.
We can ignore zero columns without a problem.*/
in=w;
out=_y;
if(cext==NULL)for(ci=0;ci<8;ci++)oc_fdct8(out+(ci<<3),in+ci);
else for(ci=0;ci<8;ci++)if(rmask&(1<<ci))oc_fdct8_ext(out+(ci<<3),in+ci,cext);
/*Transform the rows.
We transform even rows that are supposedly zero, because rounding errors
may make them slightly non-zero, and this will give a more precise
reconstruction with very small quantizers.*/
in=_y;
out=w;
if(rext==NULL)for(ri=0;ri<8;ri++)oc_fdct8(out+(ri<<3),in+ri);
else for(ri=0;ri<8;ri++)oc_fdct8_ext(out+(ri<<3),in+ri,rext);
/*Round the result back to the external working precision (which is still
scaled by four relative to the orthogonal result).
TODO: We should just update the external working precision.*/
for(ci=0;ci<64;ci++)_y[ci]=w[ci]+2>>2;
}
#endif

View file

@ -0,0 +1,87 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: fragment.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
#include <string.h>
#include "internal.h"
void oc_frag_copy(const oc_theora_state *_state,unsigned char *_dst,
const unsigned char *_src,int _ystride){
(*_state->opt_vtable.frag_copy)(_dst,_src,_ystride);
}
void oc_frag_copy_c(unsigned char *_dst,const unsigned char *_src,int _ystride){
int i;
for(i=8;i-->0;){
memcpy(_dst,_src,8*sizeof(*_dst));
_dst+=_ystride;
_src+=_ystride;
}
}
void oc_frag_recon_intra(const oc_theora_state *_state,unsigned char *_dst,
int _ystride,const ogg_int16_t _residue[64]){
_state->opt_vtable.frag_recon_intra(_dst,_ystride,_residue);
}
void oc_frag_recon_intra_c(unsigned char *_dst,int _ystride,
const ogg_int16_t _residue[64]){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++)_dst[j]=OC_CLAMP255(_residue[i*8+j]+128);
_dst+=_ystride;
}
}
void oc_frag_recon_inter(const oc_theora_state *_state,unsigned char *_dst,
const unsigned char *_src,int _ystride,const ogg_int16_t _residue[64]){
_state->opt_vtable.frag_recon_inter(_dst,_src,_ystride,_residue);
}
void oc_frag_recon_inter_c(unsigned char *_dst,
const unsigned char *_src,int _ystride,const ogg_int16_t _residue[64]){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++)_dst[j]=OC_CLAMP255(_residue[i*8+j]+_src[j]);
_dst+=_ystride;
_src+=_ystride;
}
}
void oc_frag_recon_inter2(const oc_theora_state *_state,unsigned char *_dst,
const unsigned char *_src1,const unsigned char *_src2,int _ystride,
const ogg_int16_t _residue[64]){
_state->opt_vtable.frag_recon_inter2(_dst,_src1,_src2,_ystride,_residue);
}
void oc_frag_recon_inter2_c(unsigned char *_dst,const unsigned char *_src1,
const unsigned char *_src2,int _ystride,const ogg_int16_t _residue[64]){
int i;
for(i=0;i<8;i++){
int j;
for(j=0;j<8;j++)_dst[j]=OC_CLAMP255(_residue[i*8+j]+(_src1[j]+_src2[j]>>1));
_dst+=_ystride;
_src1+=_ystride;
_src2+=_ystride;
}
}
void oc_restore_fpu(const oc_theora_state *_state){
_state->opt_vtable.restore_fpu();
}
void oc_restore_fpu_c(void){}

View file

@ -0,0 +1,489 @@
/********************************************************************
* *
* THIS FILE IS PART OF THE OggTheora SOFTWARE CODEC SOURCE CODE. *
* USE, DISTRIBUTION AND REPRODUCTION OF THIS LIBRARY SOURCE IS *
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: huffdec.c 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
#include <stdlib.h>
#include <string.h>
#include <ogg/ogg.h>
#include "huffdec.h"
#include "decint.h"
/*The ANSI offsetof macro is broken on some platforms (e.g., older DECs).*/
#define _ogg_offsetof(_type,_field)\
((size_t)((char *)&((_type *)0)->_field-(char *)0))
/*The number of internal tokens associated with each of the spec tokens.*/
static const unsigned char OC_DCT_TOKEN_MAP_ENTRIES[TH_NDCT_TOKENS]={
1,1,1,4,8,1,1,8,1,1,1,1,1,2,2,2,2,4,8,2,2,2,4,2,2,2,2,2,8,2,4,8
};
/*The map from external spec-defined tokens to internal tokens.
This is constructed so that any extra bits read with the original token value
can be masked off the least significant bits of its internal token index.
In addition, all of the tokens which require additional extra bits are placed
at the start of the list, and grouped by type.
OC_DCT_REPEAT_RUN3_TOKEN is placed first, as it is an extra-special case, so
giving it index 0 may simplify comparisons on some architectures.
These requirements require some substantial reordering.*/
static const unsigned char OC_DCT_TOKEN_MAP[TH_NDCT_TOKENS]={
/*OC_DCT_EOB1_TOKEN (0 extra bits)*/
15,
/*OC_DCT_EOB2_TOKEN (0 extra bits)*/
16,
/*OC_DCT_EOB3_TOKEN (0 extra bits)*/
17,
/*OC_DCT_REPEAT_RUN0_TOKEN (2 extra bits)*/
88,
/*OC_DCT_REPEAT_RUN1_TOKEN (3 extra bits)*/
80,
/*OC_DCT_REPEAT_RUN2_TOKEN (4 extra bits)*/
1,
/*OC_DCT_REPEAT_RUN3_TOKEN (12 extra bits)*/
0,
/*OC_DCT_SHORT_ZRL_TOKEN (3 extra bits)*/
48,
/*OC_DCT_ZRL_TOKEN (6 extra bits)*/
14,
/*OC_ONE_TOKEN (0 extra bits)*/
56,
/*OC_MINUS_ONE_TOKEN (0 extra bits)*/
57,
/*OC_TWO_TOKEN (0 extra bits)*/
58,
/*OC_MINUS_TWO_TOKEN (0 extra bits)*/
59,
/*OC_DCT_VAL_CAT2 (1 extra bit)*/
60,
62,
64,
66,
/*OC_DCT_VAL_CAT3 (2 extra bits)*/
68,
/*OC_DCT_VAL_CAT4 (3 extra bits)*/
72,
/*OC_DCT_VAL_CAT5 (4 extra bits)*/
2,
/*OC_DCT_VAL_CAT6 (5 extra bits)*/
4,
/*OC_DCT_VAL_CAT7 (6 extra bits)*/
6,
/*OC_DCT_VAL_CAT8 (10 extra bits)*/
8,
/*OC_DCT_RUN_CAT1A (1 extra bit)*/
18,
20,
22,
24,
26,
/*OC_DCT_RUN_CAT1B (3 extra bits)*/
32,
/*OC_DCT_RUN_CAT1C (4 extra bits)*/
12,
/*OC_DCT_RUN_CAT2A (2 extra bits)*/
28,
/*OC_DCT_RUN_CAT2B (3 extra bits)*/
40
};
/*These three functions are really part of the bitpack.c module, but
they are only used here.
Declaring local static versions so they can be inlined saves considerable
function call overhead.*/
static oc_pb_window oc_pack_refill(oc_pack_buf *_b,int _bits){
const unsigned char *ptr;
const unsigned char *stop;
oc_pb_window window;
int available;
window=_b->window;
available=_b->bits;
ptr=_b->ptr;
stop=_b->stop;
/*This version of _refill() doesn't bother setting eof because we won't
check for it after we've started decoding DCT tokens.*/
if(ptr>=stop)available=OC_LOTS_OF_BITS;
while(available<=OC_PB_WINDOW_SIZE-8){
available+=8;
window|=(oc_pb_window)*ptr++<<OC_PB_WINDOW_SIZE-available;
if(ptr>=stop)available=OC_LOTS_OF_BITS;
}
_b->ptr=ptr;
if(_bits>available)window|=*ptr>>(available&7);
_b->bits=available;
return window;
}
/*Read in bits without advancing the bit pointer.
Here we assume 0<=_bits&&_bits<=32.*/
static long oc_pack_look(oc_pack_buf *_b,int _bits){
oc_pb_window window;
int available;
long result;
window=_b->window;
available=_b->bits;
if(_bits==0)return 0;
if(_bits>available)_b->window=window=oc_pack_refill(_b,_bits);
result=window>>OC_PB_WINDOW_SIZE-_bits;
return result;
}
/*Advance the bit pointer.*/
static void oc_pack_adv(oc_pack_buf *_b,int _bits){
/*We ignore the special cases for _bits==0 and _bits==32 here, since they are
never used actually used.
OC_HUFF_SLUSH (defined below) would have to be at least 27 to actually read
32 bits in a single go, and would require a 32 GB lookup table (assuming
8 byte pointers, since 4 byte pointers couldn't fit such a table).*/
_b->window<<=_bits;
_b->bits-=_bits;
}
/*The log_2 of the size of a lookup table is allowed to grow to relative to
the number of unique nodes it contains.
E.g., if OC_HUFF_SLUSH is 2, then at most 75% of the space in the tree is
wasted (each node will have an amortized cost of at most 20 bytes when using
4-byte pointers).
Larger numbers can decode tokens with fewer read operations, while smaller
numbers may save more space (requiring as little as 8 bytes amortized per
node, though there will be more nodes).
With a sample file:
32233473 read calls are required when no tree collapsing is done (100.0%).
19269269 read calls are required when OC_HUFF_SLUSH is 0 (59.8%).
11144969 read calls are required when OC_HUFF_SLUSH is 1 (34.6%).
10538563 read calls are required when OC_HUFF_SLUSH is 2 (32.7%).
10192578 read calls are required when OC_HUFF_SLUSH is 3 (31.6%).
Since a value of 1 gets us the vast majority of the speed-up with only a
small amount of wasted memory, this is what we use.*/
#define OC_HUFF_SLUSH (1)
/*Determines the size in bytes of a Huffman tree node that represents a
subtree of depth _nbits.
_nbits: The depth of the subtree.
If this is 0, the node is a leaf node.
Otherwise 1<<_nbits pointers are allocated for children.
Return: The number of bytes required to store the node.*/
static size_t oc_huff_node_size(int _nbits){
size_t size;
size=_ogg_offsetof(oc_huff_node,nodes);
if(_nbits>0)size+=sizeof(oc_huff_node *)*(1<<_nbits);
return size;
}
static oc_huff_node *oc_huff_node_init(char **_storage,size_t _size,int _nbits){
oc_huff_node *ret;
ret=(oc_huff_node *)*_storage;
ret->nbits=(unsigned char)_nbits;
(*_storage)+=_size;
return ret;
}
/*Determines the size in bytes of a Huffman tree.
_nbits: The depth of the subtree.
If this is 0, the node is a leaf node.
Otherwise storage for 1<<_nbits pointers are added for children.
Return: The number of bytes required to store the tree.*/
static size_t oc_huff_tree_size(const oc_huff_node *_node){
size_t size;
size=oc_huff_node_size(_node->nbits);
if(_node->nbits){
int nchildren;
int i;
nchildren=1<<_node->nbits;
for(i=0;i<nchildren;i+=1<<_node->nbits-_node->nodes[i]->depth){
size+=oc_huff_tree_size(_node->nodes[i]);
}
}
return size;
}
/*Unpacks a sub-tree from the given buffer.
_opb: The buffer to unpack from.
_binodes: The nodes to store the sub-tree in.
_nbinodes: The number of nodes available for the sub-tree.
Return: 0 on success, or a negative value on error.*/
static int oc_huff_tree_unpack(oc_pack_buf *_opb,
oc_huff_node *_binodes,int _nbinodes){
oc_huff_node *binode;
long bits;
int nused;
if(_nbinodes<1)return TH_EBADHEADER;
binode=_binodes;
nused=0;
bits=oc_pack_read1(_opb);
if(oc_pack_bytes_left(_opb)<0)return TH_EBADHEADER;
/*Read an internal node:*/
if(!bits){
int ret;
nused++;
binode->nbits=1;
binode->depth=1;
binode->nodes[0]=_binodes+nused;
ret=oc_huff_tree_unpack(_opb,_binodes+nused,_nbinodes-nused);
if(ret>=0){
nused+=ret;
binode->nodes[1]=_binodes+nused;
ret=oc_huff_tree_unpack(_opb,_binodes+nused,_nbinodes-nused);
}
if(ret<0)return ret;
nused+=ret;
}
/*Read a leaf node:*/
else{
int ntokens;
int token;
int i;
bits=oc_pack_read(_opb,OC_NDCT_TOKEN_BITS);
if(oc_pack_bytes_left(_opb)<0)return TH_EBADHEADER;
/*Find out how many internal tokens we translate this external token into.*/
ntokens=OC_DCT_TOKEN_MAP_ENTRIES[bits];
if(_nbinodes<2*ntokens-1)return TH_EBADHEADER;
/*Fill in a complete binary tree pointing to the internal tokens.*/
for(i=1;i<ntokens;i<<=1){
int j;
binode=_binodes+nused;
nused+=i;
for(j=0;j<i;j++){
binode[j].nbits=1;
binode[j].depth=1;
binode[j].nodes[0]=_binodes+nused+2*j;
binode[j].nodes[1]=_binodes+nused+2*j+1;
}
}
/*And now the leaf nodes with those tokens.*/
token=OC_DCT_TOKEN_MAP[bits];
for(i=0;i<ntokens;i++){
binode=_binodes+nused++;
binode->nbits=0;
binode->depth=1;
binode->token=token+i;
}
}
return nused;
}
/*Finds the depth of shortest branch of the given sub-tree.
The tree must be binary.
_binode: The root of the given sub-tree.
_binode->nbits must be 0 or 1.
Return: The smallest depth of a leaf node in this sub-tree.
0 indicates this sub-tree is a leaf node.*/
static int oc_huff_tree_mindepth(oc_huff_node *_binode){
int depth0;
int depth1;
if(_binode->nbits==0)return 0;
depth0=oc_huff_tree_mindepth(_binode->nodes[0]);
depth1=oc_huff_tree_mindepth(_binode->nodes[1]);
return OC_MINI(depth0,depth1)+1;
}
/*Finds the number of internal nodes at a given depth, plus the number of
leaves at that depth or shallower.
The tree must be binary.
_binode: The root of the given sub-tree.
_binode->nbits must be 0 or 1.
Return: The number of entries that would be contained in a jump table of the
given depth.*/
static int oc_huff_tree_occupancy(oc_huff_node *_binode,int _depth){
if(_binode->nbits==0||_depth<=0)return 1;
else{
return oc_huff_tree_occupancy(_binode->nodes[0],_depth-1)+
oc_huff_tree_occupancy(_binode->nodes[1],_depth-1);
}
}
/*Makes a copy of the given Huffman tree.
_node: The Huffman tree to copy.
Return: The copy of the Huffman tree.*/
static oc_huff_node *oc_huff_tree_copy(const oc_huff_node *_node,
char **_storage){
oc_huff_node *ret;
ret=oc_huff_node_init(_storage,oc_huff_node_size(_node->nbits),_node->nbits);
ret->depth=_node->depth;
if(_node->nbits){
int nchildren;
int i;
int inext;
nchildren=1<<_node->nbits;
for(i=0;i<nchildren;){
ret->nodes[i]=oc_huff_tree_copy(_node->nodes[i],_storage);
inext=i+(1<<_node->nbits-ret->nodes[i]->depth);
while(++i<inext)ret->nodes[i]=ret->nodes[i-1];
}
}
else ret->token=_node->token;
return ret;
}
static size_t oc_huff_tree_collapse_size(oc_huff_node *_binode,int _depth){
size_t size;
int mindepth;
int depth;
int loccupancy;
int occupancy;
if(_binode->nbits!=0&&_depth>0){
return oc_huff_tree_collapse_size(_binode->nodes[0],_depth-1)+
oc_huff_tree_collapse_size(_binode->nodes[1],_depth-1);
}
depth=mindepth=oc_huff_tree_mindepth(_binode);
occupancy=1<<mindepth;
do{
loccupancy=occupancy;
occupancy=oc_huff_tree_occupancy(_binode,++depth);
}
while(occupancy>loccupancy&&occupancy>=1<<OC_MAXI(depth-OC_HUFF_SLUSH,0));
depth--;
size=oc_huff_node_size(depth);
if(depth>0){
size+=oc_huff_tree_collapse_size(_binode->nodes[0],depth-1);
size+=oc_huff_tree_collapse_size(_binode->nodes[1],depth-1);
}
return size;
}
static oc_huff_node *oc_huff_tree_collapse(oc_huff_node *_binode,
char **_storage);
/*Fills the given nodes table with all the children in the sub-tree at the
given depth.
The nodes in the sub-tree with a depth less than that stored in the table
are freed.
The sub-tree must be binary and complete up until the given depth.
_nodes: The nodes table to fill.
_binode: The root of the sub-tree to fill it with.
_binode->nbits must be 0 or 1.
_level: The current level in the table.
0 indicates that the current node should be stored, regardless of
whether it is a leaf node or an internal node.
_depth: The depth of the nodes to fill the table with, relative to their
parent.*/
static void oc_huff_node_fill(oc_huff_node **_nodes,
oc_huff_node *_binode,int _level,int _depth,char **_storage){
if(_level<=0||_binode->nbits==0){
int i;
_binode->depth=(unsigned char)(_depth-_level);
_nodes[0]=oc_huff_tree_collapse(_binode,_storage);
for(i=1;i<1<<_level;i++)_nodes[i]=_nodes[0];
}
else{
_level--;
oc_huff_node_fill(_nodes,_binode->nodes[0],_level,_depth,_storage);
_nodes+=1<<_level;
oc_huff_node_fill(_nodes,_binode->nodes[1],_level,_depth,_storage);
}
}
/*Finds the largest complete sub-tree rooted at the current node and collapses
it into a single node.
This procedure is then applied recursively to all the children of that node.
_binode: The root of the sub-tree to collapse.
_binode->nbits must be 0 or 1.
Return: The new root of the collapsed sub-tree.*/
static oc_huff_node *oc_huff_tree_collapse(oc_huff_node *_binode,
char **_storage){
oc_huff_node *root;
size_t size;
int mindepth;
int depth;
int loccupancy;
int occupancy;
depth=mindepth=oc_huff_tree_mindepth(_binode);
occupancy=1<<mindepth;
do{
loccupancy=occupancy;
occupancy=oc_huff_tree_occupancy(_binode,++depth);
}
while(occupancy>loccupancy&&occupancy>=1<<OC_MAXI(depth-OC_HUFF_SLUSH,0));
depth--;
if(depth<=1)return oc_huff_tree_copy(_binode,_storage);
size=oc_huff_node_size(depth);
root=oc_huff_node_init(_storage,size,depth);
root->depth=_binode->depth;
oc_huff_node_fill(root->nodes,_binode,depth,depth,_storage);
return root;
}
/*Unpacks a set of Huffman trees, and reduces them to a collapsed
representation.
_opb: The buffer to unpack the trees from.
_nodes: The table to fill with the Huffman trees.
Return: 0 on success, or a negative value on error.*/
int oc_huff_trees_unpack(oc_pack_buf *_opb,
oc_huff_node *_nodes[TH_NHUFFMAN_TABLES]){
int i;
for(i=0;i<TH_NHUFFMAN_TABLES;i++){
oc_huff_node nodes[511];
char *storage;
size_t size;
int ret;
/*Unpack the full tree into a temporary buffer.*/
ret=oc_huff_tree_unpack(_opb,nodes,sizeof(nodes)/sizeof(*nodes));
if(ret<0)return ret;
/*Figure out how big the collapsed tree will be.*/
size=oc_huff_tree_collapse_size(nodes,0);
storage=(char *)_ogg_calloc(1,size);
if(storage==NULL)return TH_EFAULT;
/*And collapse it.*/
_nodes[i]=oc_huff_tree_collapse(nodes,&storage);
}
return 0;
}
/*Makes a copy of the given set of Huffman trees.
_dst: The array to store the copy in.
_src: The array of trees to copy.*/
int oc_huff_trees_copy(oc_huff_node *_dst[TH_NHUFFMAN_TABLES],
const oc_huff_node *const _src[TH_NHUFFMAN_TABLES]){
int i;
for(i=0;i<TH_NHUFFMAN_TABLES;i++){
size_t size;
char *storage;
size=oc_huff_tree_size(_src[i]);
storage=(char *)_ogg_calloc(1,size);
if(storage==NULL){
while(i-->0)_ogg_free(_dst[i]);
return TH_EFAULT;
}
_dst[i]=oc_huff_tree_copy(_src[i],&storage);
}
return 0;
}
/*Frees the memory used by a set of Huffman trees.
_nodes: The array of trees to free.*/
void oc_huff_trees_clear(oc_huff_node *_nodes[TH_NHUFFMAN_TABLES]){
int i;
for(i=0;i<TH_NHUFFMAN_TABLES;i++)_ogg_free(_nodes[i]);
}
/*Unpacks a single token using the given Huffman tree.
_opb: The buffer to unpack the token from.
_node: The tree to unpack the token with.
Return: The token value.*/
int oc_huff_token_decode(oc_pack_buf *_opb,const oc_huff_node *_node){
long bits;
while(_node->nbits!=0){
bits=oc_pack_look(_opb,_node->nbits);
_node=_node->nodes[bits];
oc_pack_adv(_opb,_node->depth);
}
return _node->token;
}

View file

@ -5,19 +5,20 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: huffdec.h 15400 2008-10-15 12:10:58Z tterribe $
last mod: $Id: huffdec.h 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
#if !defined(_huffdec_H)
# define _huffdec_H (1)
# include "huffman.h"
# include "bitpack.h"
@ -75,17 +76,17 @@ struct oc_huff_node{
The ACTUAL size of this array is 1<<nbits, despite what the declaration
below claims.
The exception is that for leaf nodes the size is 0.*/
oc_huff_node *nodes[1];
oc_huff_node *nodes[2];
};
int oc_huff_trees_unpack(oggpack_buffer *_opb,
int oc_huff_trees_unpack(oc_pack_buf *_opb,
oc_huff_node *_nodes[TH_NHUFFMAN_TABLES]);
void oc_huff_trees_copy(oc_huff_node *_dst[TH_NHUFFMAN_TABLES],
int oc_huff_trees_copy(oc_huff_node *_dst[TH_NHUFFMAN_TABLES],
const oc_huff_node *const _src[TH_NHUFFMAN_TABLES]);
void oc_huff_trees_clear(oc_huff_node *_nodes[TH_NHUFFMAN_TABLES]);
int oc_huff_token_decode(oggpack_buffer *_opb,const oc_huff_node *_node);
int oc_huff_token_decode(oc_pack_buf *_opb,const oc_huff_node *_node);
#endif

View file

@ -1,19 +1,11 @@
#include <stdlib.h>
#include <string.h>
#include "theora/theoraenc.h"
#include "theora/theora.h"
#include "codec_internal.h"
#include "../dec/ocintrin.h"
/*Wrapper to translate the new API into the old API.
Eventually we need to convert the old functions to support the new API
natively and do the translation the other way.
theora-exp already the necessary code to do so.*/
#include <ogg/ogg.h>
#include "huffenc.h"
/*The default Huffman codes used for VP3.1.
It's kind of useless to include this, as TH_ENCCTL_SET_HUFFMAN_CODES is not
actually implemented in the old encoder, but it's part of the public API.*/
/*The default Huffman codes used for VP3.1.*/
const th_huff_code TH_VP31_HUFF_CODES[TH_NHUFFMAN_TABLES][TH_NDCT_TOKENS]={
{
{0x002D, 6},{0x0026, 7},{0x0166, 9},{0x004E, 8},
@ -819,323 +811,100 @@ const th_huff_code TH_VP31_HUFF_CODES[TH_NHUFFMAN_TABLES][TH_NDCT_TOKENS]={
static void th_info2theora_info(theora_info *_ci,const th_info *_info){
_ci->version_major=_info->version_major;
_ci->version_minor=_info->version_minor;
_ci->version_subminor=_info->version_subminor;
_ci->width=_info->frame_width;
_ci->height=_info->frame_height;
_ci->frame_width=_info->pic_width;
_ci->frame_height=_info->pic_height;
_ci->offset_x=_info->pic_x;
_ci->offset_y=_info->pic_y;
_ci->fps_numerator=_info->fps_numerator;
_ci->fps_denominator=_info->fps_denominator;
_ci->aspect_numerator=_info->aspect_numerator;
_ci->aspect_denominator=_info->aspect_denominator;
switch(_info->colorspace){
case TH_CS_ITU_REC_470M:_ci->colorspace=OC_CS_ITU_REC_470M;break;
case TH_CS_ITU_REC_470BG:_ci->colorspace=OC_CS_ITU_REC_470BG;break;
default:_ci->colorspace=OC_CS_UNSPECIFIED;break;
}
switch(_info->pixel_fmt){
case TH_PF_420:_ci->pixelformat=OC_PF_420;break;
case TH_PF_422:_ci->pixelformat=OC_PF_422;break;
case TH_PF_444:_ci->pixelformat=OC_PF_444;break;
default:_ci->pixelformat=OC_PF_RSVD;
}
_ci->target_bitrate=_info->target_bitrate;
_ci->quality=_info->quality;
_ci->codec_setup=NULL;
/*Defaults from old encoder_example... eventually most of these should go
away when we make the encoder no longer use them.*/
_ci->dropframes_p=0;
_ci->keyframe_auto_p=1;
_ci->keyframe_frequency=1<<_info->keyframe_granule_shift;
_ci->keyframe_frequency_force=1<<_info->keyframe_granule_shift;
_ci->keyframe_data_target_bitrate=
_info->target_bitrate+(_info->target_bitrate>>1);
_ci->keyframe_auto_threshold=80;
_ci->keyframe_mindistance=8;
_ci->noise_sensitivity=1;
_ci->sharpness=0;
_ci->quick_p=1;
/*A description of a Huffman code value used when encoding the tree.*/
typedef struct{
/*The bit pattern, left-shifted so that the MSB of all patterns is
aligned.*/
ogg_uint32_t pattern;
/*The amount the bit pattern was shifted.*/
int shift;
/*The token this bit pattern represents.*/
int token;
}oc_huff_entry;
/*Compares two oc_huff_entry structures by their bit patterns.
_c1: The first entry to compare.
_c2: The second entry to compare.
Return: <0 if _c1<_c2, >0 if _c1>_c2.*/
static int huff_entry_cmp(const void *_c1,const void *_c2){
ogg_uint32_t b1;
ogg_uint32_t b2;
b1=((const oc_huff_entry *)_c1)->pattern;
b2=((const oc_huff_entry *)_c2)->pattern;
return b1<b2?-1:b1>b2?1:0;
}
static int _ilog(unsigned _v){
int ret;
for(ret=0;_v;ret++)_v>>=1;
return ret;
}
struct th_enc_ctx{
/*This is required at the start of the struct for the common functions to
work.*/
th_info info;
/*The actual encoder.*/
theora_state state;
/*A temporary buffer for input frames.
This is needed if the U and V strides differ, or padding is required.*/
unsigned char *buf;
};
th_enc_ctx *th_encode_alloc(const th_info *_info){
theora_info ci;
th_enc_ctx *enc;
th_info2theora_info(&ci,_info);
/*Do a bunch of checks the new API does, but the old one didn't.*/
if((_info->frame_width&0xF)||(_info->frame_height&0xF)||
_info->frame_width>=0x100000||_info->frame_height>=0x100000||
_info->pic_x+_info->pic_width>_info->frame_width||
_info->pic_y+_info->pic_height>_info->frame_height||
_info->pic_x>255||
_info->frame_height-_info->pic_height-_info->pic_y>255||
_info->colorspace<0||_info->colorspace>=TH_CS_NSPACES||
_info->pixel_fmt<0||_info->pixel_fmt>=TH_PF_NFORMATS){
enc=NULL;
}
else{
enc=(th_enc_ctx *)_ogg_malloc(sizeof(*enc));
if(theora_encode_init(&enc->state,&ci)<0){
_ogg_free(enc);
enc=NULL;
/*Encodes a description of the given Huffman tables.
Although the codes are stored in the encoder as flat arrays, in the bit
stream and in the decoder they are structured as a tree.
This function recovers the tree structure from the flat array and then
writes it out.
Note that the codes MUST form a Huffman code, and not merely a prefix-free
code, since the binary tree is assumed to be full.
_opb: The buffer to store the tree in.
_codes: The Huffman tables to pack.
Return: 0 on success, or a negative value if one of the given Huffman tables
does not form a full, prefix-free code.*/
int oc_huff_codes_pack(oggpack_buffer *_opb,
const th_huff_code _codes[TH_NHUFFMAN_TABLES][TH_NDCT_TOKENS]){
int i;
for(i=0;i<TH_NHUFFMAN_TABLES;i++){
oc_huff_entry entries[TH_NDCT_TOKENS];
int bpos;
int maxlen;
int mask;
int j;
/*First, find the maximum code length so we can align all the bit
patterns.*/
maxlen=_codes[i][0].nbits;
for(j=1;j<TH_NDCT_TOKENS;j++){
maxlen=OC_MAXI(_codes[i][j].nbits,maxlen);
}
else{
if(_info->frame_width>_info->pic_width||
_info->frame_height>_info->pic_height){
enc->buf=_ogg_malloc((_info->frame_width*_info->frame_height+
((_info->frame_width>>!(_info->pixel_fmt&1))*
(_info->frame_height>>!(_info->pixel_fmt&2))<<1))*sizeof(*enc->buf));
}
else enc->buf=NULL;
memcpy(&enc->info,_info,sizeof(enc->info));
/*Overwrite values theora_encode_init() can change; don't trust the user.*/
enc->info.version_major=ci.version_major;
enc->info.version_minor=ci.version_minor;
enc->info.version_subminor=ci.version_subminor;
enc->info.quality=ci.quality;
enc->info.target_bitrate=ci.target_bitrate;
enc->info.fps_numerator=ci.fps_numerator;
enc->info.fps_denominator=ci.fps_denominator;
enc->info.keyframe_granule_shift=_ilog(ci.keyframe_frequency_force-1);
mask=(1<<(maxlen>>1)<<(maxlen+1>>1))-1;
/*Copy over the codes into our temporary workspace.
The bit patterns are aligned, and the original entry each code is from
is stored as well.*/
for(j=0;j<TH_NDCT_TOKENS;j++){
entries[j].shift=maxlen-_codes[i][j].nbits;
entries[j].pattern=_codes[i][j].pattern<<entries[j].shift&mask;
entries[j].token=j;
}
}
return enc;
}
int th_encode_ctl(th_enc_ctx *_enc,int _req,void *_buf,size_t _buf_sz){
return theora_control(&_enc->state,_req,_buf,_buf_sz);
}
int th_encode_flushheader(th_enc_ctx *_enc,th_comment *_comments,
ogg_packet *_op){
theora_state *te;
CP_INSTANCE *cpi;
if(_enc==NULL||_op==NULL)return OC_FAULT;
te=&_enc->state;
cpi=(CP_INSTANCE *)te->internal_encode;
switch(cpi->doneflag){
case -3:{
theora_encode_header(te,_op);
return -cpi->doneflag++;
}break;
case -2:{
if(_comments==NULL)return OC_FAULT;
theora_encode_comment((theora_comment *)_comments,_op);
/*The old API does not require a theora_state struct when writing the
comment header, so it can't use its internal buffer and relies on the
application to free it.
The old documentation is wrong on this subject, and this breaks on
Windows when linking against multiple versions of libc (which is
almost always done when, e.g., using DLLs built with mingw32).
The new API _does_ require a th_enc_ctx, and states that libtheora owns
the memory.
Thus we move the contents of this packet into our internal
oggpack_buffer so it can be properly reclaimed.*/
oggpackB_reset(cpi->oggbuffer);
oggpackB_writecopy(cpi->oggbuffer,_op->packet,_op->bytes*8);
_ogg_free(_op->packet);
_op->packet=oggpackB_get_buffer(cpi->oggbuffer);
return -cpi->doneflag++;
}break;
case -1:{
theora_encode_tables(te,_op);
return -cpi->doneflag++;
}break;
case 0:return 0;
default:return OC_EINVAL;
}
}
/*Copies the picture region of the _src image plane into _dst and pads the rest
of _dst using a diffusion extension method.
We could do much better (e.g., the DCT-based low frequency extension method
in theora-exp's fdct.c) if we were to pad after motion compensation, but
that would require significant changes to the encoder.*/
static unsigned char *th_encode_copy_pad_plane(th_img_plane *_dst,
unsigned char *_buf,th_img_plane *_src,
ogg_uint32_t _pic_x,ogg_uint32_t _pic_y,
ogg_uint32_t _pic_width,ogg_uint32_t _pic_height){
size_t buf_sz;
_dst->width=_src->width;
_dst->height=_src->height;
_dst->stride=_src->width;
_dst->data=_buf;
buf_sz=_dst->width*_dst->height*sizeof(*_dst->data);
/*If we have _no_ data, just encode a dull green.*/
if(_pic_width==0||_pic_height==0)memset(_dst->data,0,buf_sz);
else{
unsigned char *dst;
unsigned char *src;
ogg_uint32_t x;
ogg_uint32_t y;
int dstride;
int sstride;
/*Step 1: Copy the data we do have.*/
dstride=_dst->stride;
sstride=_src->stride;
dst=_dst->data+_pic_y*dstride+_pic_x;
src=_src->data+_pic_y*sstride+_pic_x;
for(y=0;y<_pic_height;y++){
memcpy(dst,src,_pic_width);
dst+=dstride;
src+=sstride;
}
/*Step 2: Copy the border into any blocks that are 100% padding.
There's probably smarter things we could do than this.*/
/*Left side.*/
for(x=_pic_x;x-->0;){
dst=_dst->data+_pic_y*dstride+x;
for(y=0;y<_pic_height;y++){
dst[0]=(dst[1]<<1)+(dst-(dstride&-(y>0)))[1]+
(dst+(dstride&-(y+1<_pic_height)))[1]+2>>2;
dst+=dstride;
/*Sort the codes into ascending order.
This is the order the leaves of the tree will be traversed.*/
qsort(entries,TH_NDCT_TOKENS,sizeof(entries[0]),huff_entry_cmp);
/*For each leaf of the tree:*/
bpos=maxlen;
for(j=0;j<TH_NDCT_TOKENS;j++){
int bit;
/*If this code has any bits at all.*/
if(entries[j].shift<maxlen){
/*Descend into the tree, writing a bit for each branch.*/
for(;bpos>entries[j].shift;bpos--)oggpackB_write(_opb,0,1);
/*Mark this as a leaf node, and write its value.*/
oggpackB_write(_opb,1,1);
oggpackB_write(_opb,entries[j].token,5);
/*For each 1 branch we've descended, back up the tree until we reach a
0 branch.*/
bit=1<<bpos;
for(;entries[j].pattern&bit;bpos++)bit<<=1;
/*Validate the code.*/
if(j+1<TH_NDCT_TOKENS){
mask=~(bit-1)<<1;
/*The next entry should have a 1 bit where we had a 0, and should
match our code above that bit.
This verifies both fullness and prefix-freeness simultaneously.*/
if(!(entries[j+1].pattern&bit)||
(entries[j].pattern&mask)!=(entries[j+1].pattern&mask)){
return TH_EINVAL;
}
}
/*If there are no more codes, we should have ascended back to the top
of the tree.*/
else if(bpos<maxlen)return TH_EINVAL;
}
}
/*Right side.*/
for(x=_pic_x+_pic_width;x<_dst->width;x++){
dst=_dst->data+_pic_y*dstride+x-1;
for(y=0;y<_pic_height;y++){
dst[1]=(dst[0]<<1)+(dst-(dstride&-(y>0)))[0]+
(dst+(dstride&-(y+1<_pic_height)))[0]+2>>2;
dst+=dstride;
}
}
/*Top.*/
dst=_dst->data+_pic_y*dstride;
for(y=_pic_y;y-->0;){
for(x=0;x<_dst->width;x++){
(dst-dstride)[x]=(dst[x]<<1)+dst[x-(x>0)]+dst[x+(x+1<_dst->width)]+2>>2;
}
dst-=dstride;
}
/*Bottom.*/
dst=_dst->data+(_pic_y+_pic_height)*dstride;
for(y=_pic_y+_pic_height;y<_dst->height;y++){
for(x=0;x<_dst->width;x++){
dst[x]=((dst-dstride)[x]<<1)+(dst-dstride)[x-(x>0)]+
(dst-dstride)[x+(x+1<_dst->width)]+2>>2;
}
dst+=dstride;
}
}
_buf+=buf_sz;
return _buf;
}
int th_encode_ycbcr_in(th_enc_ctx *_enc,th_ycbcr_buffer _ycbcr){
CP_INSTANCE *cpi;
theora_state *te;
th_img_plane *pycbcr;
th_ycbcr_buffer ycbcr;
yuv_buffer yuv;
ogg_uint32_t pic_width;
ogg_uint32_t pic_height;
int hdec;
int vdec;
int ret;
if(_enc==NULL||_ycbcr==NULL)return OC_FAULT;
te=&_enc->state;
/*theora_encode_YUVin() does not bother to check uv_width and uv_height, and
then uses them.
This is arguably okay (it will most likely lead to a crash if they're
wrong, which will make the developer who passed them fix the problem), but
our API promises to return an error code instead.*/
cpi=(CP_INSTANCE *)te->internal_encode;
hdec=!(cpi->pb.info.pixelformat&1);
vdec=!(cpi->pb.info.pixelformat&2);
if(_ycbcr[0].width!=cpi->pb.info.width||
_ycbcr[0].height!=cpi->pb.info.height||
_ycbcr[1].width!=_ycbcr[0].width>>hdec||
_ycbcr[1].height!=_ycbcr[0].height>>vdec||
_ycbcr[2].width!=_ycbcr[1].width||_ycbcr[2].height!=_ycbcr[1].height){
return OC_EINVAL;
}
pic_width=cpi->pb.info.frame_width;
pic_height=cpi->pb.info.frame_height;
/*We can only directly use the input buffer if no padding is required (since
the new API is documented not to use values outside the picture region)
and if the strides for the Cb and Cr planes are the same, since the old
API had no way to specify different ones.*/
if(_ycbcr[0].width==pic_width&&_ycbcr[0].height==pic_height&&
_ycbcr[1].stride==_ycbcr[2].stride){
pycbcr=_ycbcr;
}
else{
unsigned char *buf;
int pic_x;
int pic_y;
int pli;
pic_x=cpi->pb.info.offset_x;
pic_y=cpi->pb.info.offset_y;
if(_ycbcr[0].width>pic_width||_ycbcr[0].height>pic_height){
buf=th_encode_copy_pad_plane(ycbcr+0,_enc->buf,_ycbcr+0,
pic_x,pic_y,pic_width,pic_height);
}
else{
/*If only the strides differ, we can still avoid copying the luma plane.*/
memcpy(ycbcr+0,_ycbcr+0,sizeof(ycbcr[0]));
if(_enc->buf==NULL){
_enc->buf=(unsigned char *)_ogg_malloc(
(_ycbcr[1].width*_ycbcr[1].height<<1)*sizeof(*_enc->buf));
}
buf=_enc->buf;
}
for(pli=1;pli<3;pli++){
int x0;
int y0;
x0=pic_x>>hdec;
y0=pic_y>>vdec;
buf=th_encode_copy_pad_plane(ycbcr+pli,buf,_ycbcr+pli,
x0,y0,(pic_x+pic_width+hdec>>hdec)-x0,(pic_y+pic_height+vdec>>vdec)-y0);
}
pycbcr=ycbcr;
}
yuv.y_width=pycbcr[0].width;
yuv.y_height=pycbcr[0].height;
yuv.uv_width=pycbcr[1].width;
yuv.uv_height=pycbcr[1].height;
yuv.y_stride=pycbcr[0].stride;
yuv.y=pycbcr[0].data;
yuv.uv_stride=pycbcr[1].stride;
yuv.u=pycbcr[1].data;
yuv.v=pycbcr[2].data;
ret=theora_encode_YUVin(te,&yuv);
return ret;
}
int th_encode_packetout(th_enc_ctx *_enc,int _last,ogg_packet *_op){
if(_enc==NULL)return OC_FAULT;
return theora_encode_packetout(&_enc->state,_last,_op);
}
void th_encode_free(th_enc_ctx *_enc){
if(_enc!=NULL){
theora_clear(&_enc->state);
_ogg_free(_enc->buf);
_ogg_free(_enc);
}
return 0;
}

View file

@ -0,0 +1,19 @@
#if !defined(_huffenc_H)
# define _huffenc_H (1)
# include "huffman.h"
typedef th_huff_code th_huff_table[TH_NDCT_TOKENS];
extern const th_huff_code
TH_VP31_HUFF_CODES[TH_NHUFFMAN_TABLES][TH_NDCT_TOKENS];
int oc_huff_codes_pack(oggpack_buffer *_opb,
const th_huff_code _codes[TH_NHUFFMAN_TABLES][TH_NDCT_TOKENS]);
#endif

View file

@ -5,13 +5,13 @@
* GOVERNED BY A BSD-STYLE SOURCE LICENSE INCLUDED WITH THIS SOURCE *
* IN 'COPYING'. PLEASE READ THESE TERMS BEFORE DISTRIBUTING. *
* *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2007 *
* THE Theora SOURCE CODE IS COPYRIGHT (C) 2002-2009 *
* by the Xiph.Org Foundation and contributors http://www.xiph.org/ *
* *
********************************************************************
function:
last mod: $Id: huffman.h 15400 2008-10-15 12:10:58Z tterribe $
last mod: $Id: huffman.h 16503 2009-08-22 18:14:02Z giles $
********************************************************************/
@ -65,6 +65,6 @@
#define OC_NDCT_RUN_MAX (32)
#define OC_NDCT_RUN_CAT1A_MAX (28)
extern const int OC_DCT_TOKEN_EXTRA_BITS[TH_NDCT_TOKENS];
extern const unsigned char OC_DCT_TOKEN_EXTRA_BITS[TH_NDCT_TOKENS];
#endif

Some files were not shown because too many files have changed in this diff Show more